Skip to content

ADR 0006: Story-First Feature Extraction and Schema Enforcement

Status

Accepted

Problem

We need a robust "start from story, then extract features" pipeline with predictable schemas so downstream generation controls and evaluations stay stable over time.

Non-goals

  • Full semantic NLP stack rollout in this ADR.
  • Production analytics warehouse design.
  • Replacing existing story CRUD contracts.

Decision

Introduce a deterministic chapter feature extraction pipeline:

  • core extraction logic in core/story_feature_pipeline.py
  • persistence adapter in adapters/sqlite_feature_store.py
  • API endpoints for extraction and latest run retrieval

Adopt explicit schema/version enforcement:

  • pydantic contract strict mode (extra="forbid")
  • schema constant story_features.v1
  • table-level version registry (feature_schema_versions)
  • fail-fast on schema mismatch

Public API

New endpoints:

  • POST /api/v1/stories/{story_id}/features/extract
  • GET /api/v1/stories/{story_id}/features/latest

New CLI:

  • story-features

Invariants

  • extraction starts from persisted story blueprint chapters
  • feature run is owner-scoped and story-scoped
  • schema version mismatch blocks writes/reads until migrated

Test plan

  • unit tests for core extraction behavior
  • adapter tests for persistence and schema mismatch failures
  • API tests for extraction lifecycle and owner isolation
  • contract tests for docs/entrypoints updates