0032 Batch Pipeline and Re:Zero Benchmark¶
Problem¶
We need a reliable, apples-to-apples benchmark for end-to-end pipeline runs across large chapter sets. The existing canary is valuable but does not capture translation reliability, batch throughput, or chapter-by-chapter failure modes.
Non-goals¶
- Shipping or committing third-party full text into the repository.
- Changing the core pipeline architecture or schema versions.
- Adding online translation dependencies as required defaults.
Public API¶
- New CLI:
story-pipeline-batch - Batch outputs written to
work/pipeline_runs/<run-id>/with: summary.json(overall timings and counts)chapters/*.json(per-chapter analysis summaries)translated_en/*.txt(optional translated chapter text)
Invariants¶
- Batch pipeline never commits source text into the repo.
- Offline translation paths are supported via optional dependencies.
- Timing metrics are always emitted for batch runs.
- Failure in one chapter does not halt the batch unless
--strictis set.
Test plan¶
uv run pytest tests/test_pipeline_batch.pyuv run pytest tests/test_cli_entrypoints.py -k pipeline_batch