Story Feature Pipeline¶

This pipeline starts from a saved story blueprint and extracts deterministic chapter-level metrics that can later feed evaluators, drift checks, and generation controls.

Flow¶

Load story blueprint from storage.
Convert chapters to extraction inputs.
Extract feature rows with a versioned schema (story_features.v1).
Persist run metadata and rows to SQLite tables.
Query latest run for analysis or downstream generation setup.

Current extracted features¶

source length (chars)
sentence count
token count
average sentence length
dialogue line ratio
top keywords

Table schema (SQLite)¶

feature_schema_versions
records expected schema version for feature tables
story_feature_runs
one row per extraction run (run_id, story_id, owner_id, version, timestamp)
story_feature_rows
one row per chapter within a run

Schema enforcement best practices implemented¶

strict pydantic models (extra="forbid") on contracts
explicit schema version constant (story_features.v1)
startup version check against feature_schema_versions
fail-fast behavior on schema mismatch
stable field naming and typed serialization (top_keywords_json)
owner-scoped read/write paths in API

Next hardening steps¶

add migration scripts for version upgrades (v2, v3, ...)
add per-field table CHECK constraints where useful
add semantic drift baselines keyed by schema version

Contract tracking¶

Pipeline stage and schema contracts are tracked in a shared registry:

Runtime registry: src/story_gen/api/contract_registry.py
Exported snapshot: work/contracts/story_pipeline_contract_registry.v1.json

Regenerate snapshot after contract changes:

make contracts-export