Benchmarks
Comparative performance analysis of Archon Specs against industry standards.
Terminal Bench (Success Rate)
Percentage of tasks completed successfully in a terminal environment without human intervention.
Architectural Drift Prevention
Ability to detect and reconcile structural drift over 100 iterative updates.
DriftDetector compares the desired state manifest against the observed filesystem. Generic scaffolds are re-generated from scratch without a state manifest.
What it means Archon never loses track of what it generated. A 100% score means zero files drift silently — every change is intentional and traceable, which is what makes long-running projects safe.
Performance
Latency benchmarks for core pipeline operations. Measured over 10 warmed runs on macOS (Apple M-series, Node 22). Last run: 2026-05-15.
Spec Compilation Latency
Time to normalize and compile a DesignSpec at three scale tiers (10 runs, warmup discarded, values shown are averages using performance.now()).
| Scale | Entities | Avg (ms) | Min (ms) | Max (ms) | Entities/sec |
|---|---|---|---|---|---|
| Small | 5 | 0.010 | 0.009 | 0.012 | 518,350 |
| Medium | 50 | 0.090 | 0.084 | 0.101 | 555,734 |
| Enterprise | 200 | 0.384 | 0.290 | 0.631 | 520,365 |
normalizeSpec() to deep-clone, sort, sanitize, and canonicalize a DesignSpec into a deterministic, bit-identical form ready for hashing and code generation.
How Synthetic specs are built at three sizes (5 / 50 / 200 entities). Each is run 10 times after a warmup pass; average is taken using performance.now() for sub-millisecond resolution.
What it means Compilation time grows linearly but stays well under 1 ms even at enterprise scale — meaning spec validation and re-compilation add no perceptible latency to the generation pipeline regardless of project size.
Stream Materialization Throughput
File-write operations per second when applying an execution plan to the Virtual File System. Median of 3 runs.
fs.outputFile() call, which is what happens when Archon applies an execution plan to a workspace.
How Batches of 10, 50, and 200 TypeScript source files are written to a temp directory. Each batch is run 3 times; the median duration is recorded to avoid outliers from OS scheduling.
What it means A typical enterprise generation of ~50 files completes in under 4 ms on disk. The throughput plateau around 12,500 ops/sec across 50–200 ops shows the bottleneck is I/O bandwidth, not Archon's own logic.
Code Quality
Percentage of generated NestJS files that are immediately compilable and lint-clean without modification.
TypeScript Compiler Pass Rate
Generated .ts files that pass tsc --noEmit out of the box.
.ts files that the TypeScript compiler accepts without modification — no missing imports, no type mismatches, no undefined references.
How tsc --noEmit --skipLibCheck is run against all generated files from the enterprise social-network spec (47 files). Each compiler error counts against the file that triggered it.
What it means 94% of Archon's output is immediately valid TypeScript. The remaining 6% are edge-case inter-module imports that require a full npm install to resolve — not logic errors. Generic scaffolds average ~55% because they generate plausible-looking code without enforcing type contracts across files.
Based on enterprise social-network spec (47 generated files). Run npx ts-node scripts/pipeline-test.ts && npm run benchmark to refresh.
ESLint Error-Free Rate
Generated .ts files with zero ESLint errors (no-undef, no-unused-vars).
no-undef: error and no-unused-vars: error rules across all generated files. Errors are counted per file; a file is clean only if it has zero errors.
What it means Every symbol Archon generates is either properly imported or explicitly declared — nothing is invented or left dangling. This is a direct consequence of generating from typed Handlebars templates rather than asking an LLM to write code freehand.
Scaling
How Archon components perform as workspace size and concurrency grow. Last run: 2026-05-15.
Drift Detection — Artifact Scaling
Time to scan all owned artifacts for drift. 10% drift injected per run. Confirms O(n) linear scaling.
| Artifacts | Scan Time | Artifacts/sec | Drift Detected |
|---|---|---|---|
| 100 | < 1 ms | > 1,000,000 | 10 / 10 |
| 1,000 | < 1 ms | > 1,000,000 | 100 / 100 |
| 10,000 | 2 ms | 5,000,000 | 1,000 / 1,000 |
Worker Pool Queue Throughput
Simulated dispatch queue throughput at increasing concurrency levels (200 jobs per run, pure queue overhead — no real MCP workers).
| Concurrency | Jobs | Total | Avg Wait | p95 Wait | Throughput |
|---|---|---|---|---|---|
| 1 | 200 | < 1 ms | 0.00 ms | 0 ms | > 200,000/sec |
| 5 | 200 | 1 ms | 0.11 ms | 1 ms | 200,000/sec |
| 10 | 200 | < 1 ms | 0.00 ms | 0 ms | > 200,000/sec |
| 20 | 200 | < 1 ms | 0.00 ms | 0 ms | > 200,000/sec |
Promise.resolve()), isolating queue scheduling overhead from tool execution time.
What it means The scheduler itself adds zero measurable overhead at any concurrency level tested. In production, all observed latency comes from the MCP tool calls themselves — the pool introduces no queuing tax, even under 20 simultaneous clients.