roger@daliai~/workshop/dungeon-forge· postmortem
LUCERNE · 47.05°Nreadonly
← back to workshop
PID 03 / dungeon-forgePOSTMORTEM

Dungeon Forge — Postmortem.

From 93 files of non-working code to a reusable 10-day Steam-ready chassis.

The first attempt produced 93 files and 36k lines of code that never ran as a game. The second attempt — Dungeon Forge — is both a specific 3D strategy title and a reusable methodology for autonomous game production. The rewrite is not a bug fix. It is a re-architecture of what "autonomous code generation" has to validate before it can ship.

// failure modes

Failure modes.

01

Agents writing against APIs that did not exist

V1 had no contract layer. Agents invented method names, mis-spelled exports, guessed constructor signatures. Code compiled file-by-file, failed the moment any two files had to cooperate.

SEVERITY fatalSTATUS OUTCOME V1: 93 files, zero playable builds
02

"Syntax correct" misread as "game works"

The browser gate validated imports and syntax. Whether a dungeon was walkable, whether combat closed its loop, whether loot propagated — never checked. MiroFish review (7-agent expert panel) scored V1 a 6/10 and named this the fundamental gap: "we validate syntax, not game logic."

SEVERITY fatalDISCOVERED VIA multi-agent architecture review
03

Cross-file blindness in parallel generation

Each 16k-token chunk was generated without awareness of its siblings. Two files could define the same class with incompatible signatures and both pass local validation.

SEVERITY critical
04

The Sisyphus problem

Every pipeline regeneration overwrote manually extracted style/data/shader modules. Files crept back over the 350-line cap on every run. The pipeline fought its own cleanups.

SEVERITY high
05

Non-atomic tier transactions

A mid-tier crash left half-written files on disk. State tracking updated per file, not per tier. No rollback path.

SEVERITY critical
06

Budget under-utilisation of 4,600×

Available budget: ~216,000 requests per 10-day window. V1 used 47. The pipeline was structurally incapable of spending the budget that a real Steam-grade build requires.

SEVERITY strategic
// engineered response

Engineered response.

Contract-first architecture

answers: #1, #3

Three JSON contracts define the system before code exists: interfaces.json (module exports, methods, dependencies), events.json (event registry), constructors.json (signatures). Agents cannot invent APIs — every generated file is validated against the contract.

EventBus pub/sub, zero cross-system imports

answers: #3

Systems communicate only through a shared EventBus. No system imports another. Fully parallel agent development becomes mechanically safe.

Cross-file awareness in the coder prompt

answers: #3

Every generation call sees sibling signatures and the contents of files it imports. Class definitions converge instead of diverging.

Four-layer extraction-module protection

answers: #4

CANONICAL.md table + director prompt rules + write-guards at all three write paths + interfaces.json extraction_module flag. Twelve extraction modules survive every regeneration intact.

Atomic tier transactions with git-tag rollback

answers: #5

Every tier boundary is a git tag. Failure at tier N resets to tier N-1 automatically. No orphan files.

Iteration Pipeline v2 — a 10-day chassis

answers: #2, #6

Director v2 (3-lane FSM), SLBB budget-broker, rate-guard, regression-gate, playtest-bot (LLM player via Playwright), balance simulator (1,000 playthroughs/day), debug agent, blocking visual gate. Eight modules, ~2,400 LOC, smoke-verified.

Portable by design

cp the folder, edit project-config.json, run start_iteration.sh. The chassis is not specific to Dungeon Forge — it is the methodology for turning an empty repo into a Steam-ready game in 10 days.

// what transfers

What transfers.

  • Contracts defined before generation beat contracts inferred after. Agent hallucination of APIs is eliminated by construction, not caught by review.
  • Syntax validation and semantic validation are different products. Shipping autonomous code needs both.
  • Gate enforcement must be BLOCKING. A non-blocking gate is a logged warning — agents will treat it as noise.
  • A 10-day autonomous pipeline is a budget-allocation problem before it is a code-generation problem. Under-spending the budget is the failure mode, not over-spending it.
  • The same chassis that ships one game should ship the next one with a config edit. Project-specific hand-wiring is a smell, not a feature.
11
Agents
3
Lanes
10
Target days
v2.1.1
Pipeline
← back to workshop