Building a Harness: How I Got Agents to Verify Their Own Code
I wanted to see how far I could push autonomous code generation. Not by writing better prompts, but by building a system where agents could implement, verify, and fix their own work without me watching. A DOCX editor built from behavioral specs and pixel diffs was the testbed.