Story Feature Pipeline¶
This pipeline starts from a saved story blueprint and extracts deterministic chapter-level metrics that can later feed evaluators, drift checks, and generation controls.
Flow¶
- Load story blueprint from storage.
- Convert chapters to extraction inputs.
- Extract feature rows with a versioned schema (
story_features.v1). - Persist run metadata and rows to SQLite tables.
- Query latest run for analysis or downstream generation setup.
Current extracted features¶
- source length (chars)
- sentence count
- token count
- average sentence length
- dialogue line ratio
- top keywords
Table schema (SQLite)¶
feature_schema_versions- records expected schema version for feature tables
story_feature_runs- one row per extraction run (
run_id,story_id,owner_id, version, timestamp) story_feature_rows- one row per chapter within a run
Schema enforcement best practices implemented¶
- strict pydantic models (
extra="forbid") on contracts - explicit schema version constant (
story_features.v1) - startup version check against
feature_schema_versions - fail-fast behavior on schema mismatch
- stable field naming and typed serialization (
top_keywords_json) - owner-scoped read/write paths in API
Next hardening steps¶
- add migration scripts for version upgrades (
v2,v3, ...) - add per-field table CHECK constraints where useful
- add semantic drift baselines keyed by schema version
Contract tracking¶
Pipeline stage and schema contracts are tracked in a shared registry:
- Runtime registry:
src/story_gen/api/contract_registry.py - Exported snapshot:
work/contracts/story_pipeline_contract_registry.v1.json
Regenerate snapshot after contract changes:
make contracts-export