Dungeon Forge — Postmortem

Dungeon Forge — Postmortem.

From 93 files of non-working code to a reusable 10-day Steam-ready chassis.

The first attempt produced 93 files and 36k lines of code that never ran as a game. The second attempt — Dungeon Forge — is both a specific 3D strategy title and a reusable methodology for autonomous game production. The rewrite is not a bug fix. It is a re-architecture of what "autonomous code generation" has to validate before it can ship.

Failure modes.

Agents writing against APIs that did not exist

V1 had no contract layer. Agents invented method names, mis-spelled exports, guessed constructor signatures. Code compiled file-by-file, failed the moment any two files had to cooperate.

SEVERITY fatalSTATUS OUTCOME V1: 93 files, zero playable builds

"Syntax correct" misread as "game works"

The browser gate validated imports and syntax. Whether a dungeon was walkable, whether combat closed its loop, whether loot propagated — never checked. MiroFish review (7-agent expert panel) scored V1 a 6/10 and named this the fundamental gap: "we validate syntax, not game logic."

SEVERITY fatalDISCOVERED VIA multi-agent architecture review

Cross-file blindness in parallel generation

Each 16k-token chunk was generated without awareness of its siblings. Two files could define the same class with incompatible signatures and both pass local validation.

SEVERITY critical

The Sisyphus problem

Every pipeline regeneration overwrote manually extracted style/data/shader modules. Files crept back over the 350-line cap on every run. The pipeline fought its own cleanups.

SEVERITY high

Non-atomic tier transactions

A mid-tier crash left half-written files on disk. State tracking updated per file, not per tier. No rollback path.

SEVERITY critical

Budget under-utilisation of 4,600×

Available budget: ~216,000 requests per 10-day window. V1 used 47. The pipeline was structurally incapable of spending the budget that a real Steam-grade build requires.

SEVERITY strategic

Engineered response.

↳

Contract-first architecture

↳ answers: #1, #3

Three JSON contracts define the system before code exists: interfaces.json (module exports, methods, dependencies), events.json (event registry), constructors.json (signatures). Agents cannot invent APIs — every generated file is validated against the contract.

↳

EventBus pub/sub, zero cross-system imports

↳ answers: #3

Systems communicate only through a shared EventBus. No system imports another. Fully parallel agent development becomes mechanically safe.

↳

Cross-file awareness in the coder prompt

↳ answers: #3

Every generation call sees sibling signatures and the contents of files it imports. Class definitions converge instead of diverging.

↳

Four-layer extraction-module protection

↳ answers: #4

CANONICAL.md table + director prompt rules + write-guards at all three write paths + interfaces.json extraction_module flag. Twelve extraction modules survive every regeneration intact.

↳

Atomic tier transactions with git-tag rollback

↳ answers: #5

Every tier boundary is a git tag. Failure at tier N resets to tier N-1 automatically. No orphan files.

↳

Iteration Pipeline v2 — a 10-day chassis

↳ answers: #2, #6

Director v2 (3-lane FSM), SLBB budget-broker, rate-guard, regression-gate, playtest-bot (LLM player via Playwright), balance simulator (1,000 playthroughs/day), debug agent, blocking visual gate. Eight modules, ~2,400 LOC, smoke-verified.

↳

Portable by design

cp the folder, edit project-config.json, run start_iteration.sh. The chassis is not specific to Dungeon Forge — it is the methodology for turning an empty repo into a Steam-ready game in 10 days.

What transfers.

✓Contracts defined before generation beat contracts inferred after. Agent hallucination of APIs is eliminated by construction, not caught by review.

✓Syntax validation and semantic validation are different products. Shipping autonomous code needs both.

✓Gate enforcement must be BLOCKING. A non-blocking gate is a logged warning — agents will treat it as noise.

✓A 10-day autonomous pipeline is a budget-allocation problem before it is a code-generation problem. Under-spending the budget is the failure mode, not over-spending it.

✓The same chassis that ships one game should ship the next one with a config edit. Project-specific hand-wiring is a smell, not a feature.

→Open frontier: first full 10-day autonomous run to Steam release-candidate, live now.