Skip to content

0032 Batch Pipeline and Re:Zero Benchmark

Problem

We need a reliable, apples-to-apples benchmark for end-to-end pipeline runs across large chapter sets. The existing canary is valuable but does not capture translation reliability, batch throughput, or chapter-by-chapter failure modes.

Non-goals

  • Shipping or committing third-party full text into the repository.
  • Changing the core pipeline architecture or schema versions.
  • Adding online translation dependencies as required defaults.

Public API

  • New CLI: story-pipeline-batch
  • Batch outputs written to work/pipeline_runs/<run-id>/ with:
  • summary.json (overall timings and counts)
  • chapters/*.json (per-chapter analysis summaries)
  • translated_en/*.txt (optional translated chapter text)

Invariants

  • Batch pipeline never commits source text into the repo.
  • Offline translation paths are supported via optional dependencies.
  • Timing metrics are always emitted for batch runs.
  • Failure in one chapter does not halt the batch unless --strict is set.

Test plan

  • uv run pytest tests/test_pipeline_batch.py
  • uv run pytest tests/test_cli_entrypoints.py -k pipeline_batch