Feedback loops that existed on paper, not in execution
The Prose-Critique agent (P3B) ran as a background job in parallel to the Reviewer (P4). Its output was never wired back into the Repair-Writer (P5). The repair agent was rewriting blind.
Romanwerk is the pipeline that produces publishable long-form German fiction without human intervention. Two complete novels shipped end-to-end. The interesting part is not the output — it is what had to be solved to make multi-agent long-horizon autonomy actually hold together across 50 chapters, where every small error compounds.
The Prose-Critique agent (P3B) ran as a background job in parallel to the Reviewer (P4). Its output was never wired back into the Repair-Writer (P5). The repair agent was rewriting blind.
The P4 Reviewer — the single agent deciding HOLD vs. PASS for an entire chapter — was running on Haiku. Reliable for syntax checks, unreliable for complex MAJOR_REPAIR verdicts.
The Writer never saw the closing ~500 words of the previous chapter. Chapter openings silently repeated the tone, phrases and beats of their predecessors. Invisible to syntax validators, obvious to a reader.
Audit warnings produced by the five P4A specialist-auditors (system, numeric-timeline, cast-agency, narrative-drift, cosmos) were written to disk and died there. The next chapter's Planner never saw them — so every chapter re-earned the same mistakes.
Rolling narrative digests silently overflowed the context budget. Early chapters dominated the prompt; recent chapters became invisible. Quality degraded in a pattern that looked like model regression but was actually context truncation.
The strategic-planner (P1) regularly produced creative briefs that were missing five mandatory schema fields. Validation rejected them, the Director retried, the agent produced the same creative output, and the pipeline deadlocked.
P3B critique findings are now extracted and injected as repair_hint into P5 before the repair-writer starts. The pipeline no longer trusts that "running in parallel" means "feeding downstream".
Quality gates run on the strongest available model (Sonnet). Content generation runs on the most cost-efficient model that meets the bar. Decisions and creation are no longer priced the same.
Last 500 words of the previous chapter + LAST_CHAPTER_DIGEST are injected as PRE_TASK_INJECT before every P3 write, starting at chapter 002.
P4A warnings from chapter N are injected into the Planner's context for chapter N+1 as audit_warnings[]. Mistakes surface once, then propagate into plans instead of repairs.
digest_full (never injected) / digest_recent (last 3 chapters, verbatim) / digest_summary (older chapters, one sentence each, Haiku-compressed). Keeps long-horizon memory inside the budget.
A Python post-processor fills P1's missing schema fields from adjacent valid data. LLM retries are reserved for problems that actually need judgment.