Blog

`lf-lean`: The frontier of verified software engineering

February 5, 2026

As AI generates code faster than humans can review it, oversight becomes the bottleneck. We introduce task-level specification, a framework where correctness is defined once for an entire class of software tasks and automatically checked across all instances—reducing human oversight from 𝒪(𝓃) to 𝒪(1). We prototype this with rocq-dove, an environment for verified translation from Rocq to Lean that scales to arbitrary program complexity. Using frontier AI, we produced lf-lean: a verified translation of all 1,276 statements in the Logical Foundations textbook. Models autonomously solved 97% of statements; our total human-in-the-loop effort on the remaining problems was ~2 person-days, versus an estimated ~2.75 person-years if we did the whole project manually. These results suggest that verified software engineering, long considered impractical, may become more scalable than unverified software engineering in the era of AI-led development.

lf-lean: The frontier of verified software engineering

Systematically generating tests that would have caught Anthropic’s top‑K bug

`lf-lean`: The frontier of verified software engineering