ADR 0029: NLP Provider Resilience and Insight Calibration¶
Status¶
Accepted
Problem¶
Area:NLP critical issues required a stronger contract for translation resilience, extraction fallback behavior, beat-stage quality, and insight grounding. The previous deterministic pipeline lacked explicit provider diagnostics, actionable degradation anomalies, and a hard rejection path for weak evidence alignment.
Non-goals¶
- Introducing external hosted NLP dependencies as mandatory runtime requirements.
- Changing the public story analysis schema version or breaking dashboard/API response payload compatibility.
- Replacing deterministic fallback behavior with non-deterministic model-only execution.
Public API¶
NLP pipeline behavior now includes:
- Configurable translation provider controls via environment:
STORY_GEN_TRANSLATION_PROVIDERSTORY_GEN_LANG_ID_PROVIDERSTORY_GEN_TRANSLATION_RETRY_COUNTSTORY_GEN_TRANSLATION_TIMEOUT_MSSTORY_GEN_TRANSLATION_CIRCUIT_FAILURESSTORY_GEN_TRANSLATION_CIRCUIT_RESET_SECONDS- Configurable extraction provider controls via environment:
STORY_GEN_EXTRACTION_PROVIDERSTORY_GEN_EXTRACTION_FORCE_FAIL- Optional insight style template control:
STORY_GEN_INSIGHT_STYLE_TEMPLATE
Pipeline and API diagnostics:
- Translation and extraction stages emit diagnostics with provider, fallback, and issue metadata.
- API anomaly sink records actionable degradation events:
translation_degradedextraction_degradedinsight_consistency_failed
Quality gate behavior:
- Adds evidence consistency rejection reason:
insight_evidence_inconsistent
Invariants¶
- Translation failures degrade gracefully through deterministic fallback and do not crash pipeline execution.
- Extraction provider failures degrade to deterministic fallback with confidence downgrade.
- Every beat retains explicit evidence segment references.
- Macro/meso/micro insight payload schema remains unchanged.
- Quality gate fails when insight evidence consistency drops below calibrated floor.
- NLP regression tests measure:
- non-English translation determinism
- extraction precision/recall and beat-stage agreement on gold inputs
- adversarial insight consistency rejection path
Test plan¶
- Add translation resilience tests with deterministic non-English corpus and provider-failure fallback assertions.
- Add extraction/beat gold-metric tests asserting precision, recall, and stage agreement thresholds.
- Add insight consistency quality-gate tests for both reject and pass paths.
- Add API-level anomaly tests verifying actionable metadata for translation and insight rejection degradations.
- Run full repository gates:
uv lock --checkuv run python tools/check_imports.pyuv run ruff check .uv run ruff format --check .uv run mypyuv run pytestuv run mkdocs build --strict