roger@daliai~/workshop/romanwerk· postmortem
LUCERNE · 47.05°Nreadonly
← back to workshop
PID 01 / romanwerkPOSTMORTEM

Romanwerk — Postmortem.

50 chapters, 400+ artefacts, zero human-in-the-loop.

Romanwerk is the pipeline that produces publishable long-form German fiction without human intervention. Two complete novels shipped end-to-end. The interesting part is not the output — it is what had to be solved to make multi-agent long-horizon autonomy actually hold together across 50 chapters, where every small error compounds.

// failure modes

Failure modes.

01

Feedback loops that existed on paper, not in execution

The Prose-Critique agent (P3B) ran as a background job in parallel to the Reviewer (P4). Its output was never wired back into the Repair-Writer (P5). The repair agent was rewriting blind.

SEVERITY criticalDISCOVERED VIA multi-agent interviewSTATUS FIXED
02

Quality gates running on the cheapest model

The P4 Reviewer — the single agent deciding HOLD vs. PASS for an entire chapter — was running on Haiku. Reliable for syntax checks, unreliable for complex MAJOR_REPAIR verdicts.

SEVERITY criticalDISCOVERED VIA reviewer A/B auditSTATUS FIXED
03

Cross-chapter amnesia

The Writer never saw the closing ~500 words of the previous chapter. Chapter openings silently repeated the tone, phrases and beats of their predecessors. Invisible to syntax validators, obvious to a reader.

SEVERITY highDISCOVERED VIA reader reviewSTATUS FIXED
04

Reactive repair instead of proactive planning

Audit warnings produced by the five P4A specialist-auditors (system, numeric-timeline, cast-agency, narrative-drift, cosmos) were written to disk and died there. The next chapter's Planner never saw them — so every chapter re-earned the same mistakes.

SEVERITY highDISCOVERED VIA archivist perspectiveSTATUS FIXED
05

Digest drift starting at chapter 8

Rolling narrative digests silently overflowed the context budget. Early chapters dominated the prompt; recent chapters became invisible. Quality degraded in a pattern that looked like model regression but was actually context truncation.

SEVERITY highDISCOVERED VIA correlation auditSTATUS FIXED
06

Schema compliance as an infinite retry loop

The strategic-planner (P1) regularly produced creative briefs that were missing five mandatory schema fields. Validation rejected them, the Director retried, the agent produced the same creative output, and the pipeline deadlocked.

SEVERITY blockingDISCOVERED VIA timeout analysisSTATUS FIXED
// engineered response

Engineered response.

Explicit cross-phase wiring

answers: #1

P3B critique findings are now extracted and injected as repair_hint into P5 before the repair-writer starts. The pipeline no longer trusts that "running in parallel" means "feeding downstream".

Quality-gate model tiering

answers: #2

Quality gates run on the strongest available model (Sonnet). Content generation runs on the most cost-efficient model that meets the bar. Decisions and creation are no longer priced the same.

Cross-chapter memory injection

answers: #3

Last 500 words of the previous chapter + LAST_CHAPTER_DIGEST are injected as PRE_TASK_INJECT before every P3 write, starting at chapter 002.

Audit-to-planner backpressure

answers: #4

P4A warnings from chapter N are injected into the Planner's context for chapter N+1 as audit_warnings[]. Mistakes surface once, then propagate into plans instead of repairs.

3-tier hierarchical digest

answers: #5

digest_full (never injected) / digest_recent (last 3 chapters, verbatim) / digest_summary (older chapters, one sentence each, Haiku-compressed). Keeps long-horizon memory inside the budget.

Deterministic schema patcher

answers: #6

A Python post-processor fills P1's missing schema fields from adjacent valid data. LLM retries are reserved for problems that actually need judgment.

// what transfers

What transfers.

  • Wiring is not a runtime property. If an agent's output is not literally read by a downstream prompt, the loop does not exist.
  • Price the decision, not the token. Quality gates are the cheapest place to spend Sonnet and the most expensive place to save it.
  • Long-horizon quality collapses through context truncation long before it collapses through model capability.
  • Deterministic code beats LLM retries for any problem that can be specified.
  • Multi-agent interviews — forcing specialized agents to critique the same pipeline from their own role — surface structural problems that no single debug session finds.
2
Novels
50
Chapters
400+
Artefacts
0
Human interventions
← back to workshop