Verdict Observability

Mipiti's verdict pipeline normally answers one question per control: do the assertions submitted for this control suffice to prove it is implemented? The verdict-observability layer adds two more questions, each backed by a cheap LLM call and surfaced where the operator can act on disagreement:

Observability is off by default. When the platform operator enables it (see Disabling and tuning), the LLM verdicts run in the background and surface as a divergence list on the model — operator-attributable triage work where the LLM judgment disagrees with what the model currently asserts. Committed coverage and compliance numbers are never changed by the observability layer; it only observes.

What this protects against

Two failure modes the structural assurance layer can't catch by itself:

Both cases are silent under the structural-only path. Verdict observability surfaces them so the operator can either accept the LLM's view (re-mapping the control, adding a missing layer) or dismiss it (the structural model was right; the LLM was wrong).

Reading the divergence list

A model's divergences live on the Verdict Divergence panel (one per model). Two sections:

Coverage divergences — one row per (control, control objective) pair where the LLM and the recorded mapping disagree, with a kind:

Each row carries the LLM's probability (p_covers), a one-sentence rationale, and the timestamp at which the verdict was computed.

Group-sufficiency divergences — one row per (control objective, mitigation group) where the LLM rates the group as clearly insufficient (p_suffices ≤ 0.3 by default). Each row decorates with:

The confidence band

Not every LLM verdict surfaces. The divergence read endpoint applies a dual-band confidence floor so only confident disagreements show up:

Probabilities in the middle band — p_covers in [0.3, 0.7], or p_suffices > 0.3 — mean the LLM is uncertain. Surfacing those would generate triage work the operator would reasonably ignore. The verdicts are still cached for later inspection; they're just not surfaced as actionable rows.

The floors are tunable per deployment via environment variables — see Disabling and tuning.

Accepting a divergence

Coverage divergences support a one-click accept that applies the LLM's view as a mapping update:

A change reason is required (10-character minimum) — the same audit floor the manual co-mapping PATCH endpoint enforces. The accepted change records the operator's reasoning on the new version row, so the audit trail names what changed and why.

Group-sufficiency divergences are observation-only — there's no one-click accept because the natural fix isn't mechanical (it's "add a control" or "restructure the group", which is operator-judgment work). When the divergence list reports a critical group-sufficiency divergence, the action is in the methodology layer, not the click layer.

Stale divergences

Once a divergence is accepted (or fixed via any other route — manual remap, control refinement, etc.), the underlying inputs change, and the cached LLM verdict goes stale. The next entity mutation enqueues a re-evaluation; on completion, the divergence either updates with new probabilities or disappears entirely. Attempting to re-accept a divergence that no longer applies (e.g., the mapping was already added by something else) returns an explicit "divergence may be stale — re-fetch" error rather than silently no-op'ing.

If the surfaced list looks stale but no recent mutation has fired, POST /api/models/{id}/verdict-divergence/recompute force-enqueues a fresh re-evaluation for every control and live CO on the model. The worker debounces, so a flurry of recompute calls collapses to one re-eval per (model, control, kind) — safe to call from a UI "refresh" button.

Filtering and pagination

The read endpoint supports two query parameters for triage at scale:

The summary block always reflects the full unfiltered state; only pagination.filtered_total reflects the kind-filtered subset. A response with summary.missing_mapping_count = 47 and pagination.filtered_total = 47 on a ?kind=missing_mapping request is the same model state viewed two ways — totals (for dashboards) and the active tab (for triage).

Disabling and tuning

The observability layer is gated by the DERIVATION_GRAPH_OBSERVABILITY_ENABLED environment variable on the backend, off by default. When off:

When enabled, three further env vars tune the surfacing thresholds:

A deployment hitting a noisy surface can loosen the bounds (e.g., 0.6 / 0.4) to see more candidates; a deployment that wants only the very confident disagreements can tighten (e.g., 0.9 / 0.1).

How it relates to evidence verification

Verdict observability runs in addition to the regular evidence-sufficiency verdict pipeline (Evidence Verification). The three verdict kinds answer three distinct questions:

Verdict kind Question Affects committed math?
sufficiency Does the evidence on this control suffice to prove it implemented? Yes — feeds the Mitigated rollup.
coverage (this layer) Does this control cover this control objective? No — observation only.
group_sufficiency (this layer) Does this group's controls defeat the attacker? No — observation only.

The split is deliberate: committed posture remains structural and deterministic (no LLM in the rollup); the observability layer adds LLM judgment as a separate signal the operator can act on, without ever silently overwriting authored or structurally-derived state.