vitaliikapliuk/modelharness
61 stars · Last commit 2026-06-14
Make every model cheaper or better. Zero-config behavioral harness for Claude Code, with a reproducible 408-run benchmark
README preview
# modelharness **Make every model cheaper or better. Measured on four Claude models — none got worse.**    **What it actually is:** a zero-config Claude Code plugin. On every session start it injects a ≈910-token behavioral core — six working practices distilled from how Fable 5 was trained to operate — plus three on-demand skills and a fresh-context verifier agent. No commands to learn; the model just starts working differently: | | | |---|---| | ✅ **Grounded progress** — only claims backed by a tool result; "tests fail" said plainly | ⚡ **Act, don't overplan** — enough information means act, not narrate options | | 🎯 **Autonomy calibration** — decides minor things itself, asks only on scope or destructive actions | 🔍 **Self-verification loops** — a checkable definition of done, real checks on a cadence, a fresh-context verifier before "done" | | 🔀 **Delegation triggers** — explicit rules for when to fan work out to subagents | 📝 **Cross-session memory** — writes lessons and plans to files, so the next session can pick up the work | ## What the 17 tasks test | | | | |---|---|---|