Romanwerk — Postmortem

Romanwerk — Postmortem.

100 chapters, over 900 artefacts, no human in the loop.

Romanwerk is the pipeline that produces publishable German long-form fiction without human intervention. Four complete novels delivered end-to-end. The interesting part isn't the output — it's what had to be solved so multi-agent autonomy actually holds across 100 chapters, where every small error compounds.

Failure modes.

Feedback loops that existed on paper, not in execution

The Prose-Critique agent (P3B) ran as a background job in parallel to the Reviewer (P4). Its output was never wired back into the Repair-Writer (P5). The repair agent was rewriting blind.

SEVERITY criticalDISCOVERED VIA multi-agent interviewSTATUS FIXED

Quality gates running on the cheapest model

The P4 Reviewer — the single agent deciding HOLD vs. PASS for an entire chapter — was running on Haiku. Reliable for syntax checks, unreliable for complex MAJOR_REPAIR verdicts.

SEVERITY criticalDISCOVERED VIA reviewer A/B auditSTATUS FIXED

Cross-chapter amnesia

The Writer never saw the closing ~500 words of the previous chapter. Chapter openings silently repeated the tone, phrases and beats of their predecessors. Invisible to syntax validators, obvious to a reader.

SEVERITY highDISCOVERED VIA reader reviewSTATUS FIXED

Reactive repair instead of proactive planning

Audit warnings produced by the five P4A specialist-auditors (system, numeric-timeline, cast-agency, narrative-drift, cosmos) were written to disk and died there. The next chapter's Planner never saw them — so every chapter re-earned the same mistakes.

SEVERITY highDISCOVERED VIA archivist perspectiveSTATUS FIXED

Digest drift starting at chapter 8

Rolling narrative digests silently overflowed the context budget. Early chapters dominated the prompt; recent chapters became invisible. Quality degraded in a pattern that looked like model regression but was actually context truncation.

SEVERITY highDISCOVERED VIA correlation auditSTATUS FIXED

Schema compliance as an infinite retry loop

The strategic-planner (P1) regularly produced creative briefs that were missing five mandatory schema fields. Validation rejected them, the Director retried, the agent produced the same creative output, and the pipeline deadlocked.

SEVERITY blockingDISCOVERED VIA timeout analysisSTATUS FIXED

Engineered response.

↳

Explicit cross-phase wiring

↳ answers: #1

P3B critique findings are now extracted and injected as repair_hint into P5 before the repair-writer starts. The pipeline no longer trusts that "running in parallel" means "feeding downstream".

↳

Quality-gate model tiering

↳ answers: #2

Quality gates run on the strongest available model (Sonnet). Content generation runs on the most cost-efficient model that meets the bar. Decisions and creation are no longer priced the same.

↳

Cross-chapter memory injection

↳ answers: #3

Last 500 words of the previous chapter + LAST_CHAPTER_DIGEST are injected as PRE_TASK_INJECT before every P3 write, starting at chapter 002.

↳

Audit-to-planner backpressure

↳ answers: #4

P4A warnings from chapter N are injected into the Planner's context for chapter N+1 as audit_warnings[]. Mistakes surface once, then propagate into plans instead of repairs.

↳

3-tier hierarchical digest

↳ answers: #5

digest_full (never injected) / digest_recent (last 3 chapters, verbatim) / digest_summary (older chapters, one sentence each, Haiku-compressed). Keeps long-horizon memory inside the budget.

↳

Deterministic schema patcher

↳ answers: #6

A Python post-processor fills P1's missing schema fields from adjacent valid data. LLM retries are reserved for problems that actually need judgment.

What transfers.

✓Wiring is not a runtime property. If an agent's output is not literally read by a downstream prompt, the loop does not exist.

✓Price the decision, not the token. Quality gates are the cheapest place to spend Sonnet and the most expensive place to save it.

✓Long-horizon quality collapses through context truncation long before it collapses through model capability.

✓Deterministic code beats LLM retries for any problem that can be specified.

✓Multi-agent interviews — forcing specialized agents to critique the same pipeline from their own role — surface structural problems that no single debug session finds.

→Open frontier: replacing cloud models with the Gemma4Bolmor LoRA stack, end-to-end.