I wrote my first test suite in 2012. Red, green, refactor. It sounded like a great idea. For years, having a solid test suite was how I knew the code I shipped actually worked. Write the test first, watch it fail, write the code to make it pass. It was disciplined. It was repeatable. But it was hard to do.
I never really felt like TDD was the right approach, but I definitely saw the benefits in a well tested software product. There are two main things tests gave me: predictability and structure. That structure can feel like unnecessary overhead on smaller projects but on all of the large projects I ever worked on it was a critical part of the application itself.
It’s not enough anymore.
What Changed
AI writes code now. Not toy code. Production code. The kind of code that used to take a week takes an afternoon. But I learned quickly when you hand an AI agent a vague description of what you want, you get back an approximate representation of what you want. The more specific you can be about the requirements and details the better your result will be.
Test-driven development assumes you’ve already figured out what to build. The tests encode behavior, but they don’t capture intent. They tell you whether the function returns the right value. They don’t tell you why the function exists, what problem it solves for the user, or how it fits into the larger system. When a human writes every line of code, that context lives in the developer’s head. When an AI writes the code, that context needs to live somewhere it can read.
So we started writing specs.
What Spec Driven Development Looks Like at Olea
At Olea Edge Analytics, we build software for municipal water utilities. Our platform, Magellan, gives operations teams a way to query their data conversationally and run automated analytics workflows. These systems monitor critical infrastructure. When something breaks, a city’s water service is affected.
We can’t afford to ship code that “technically passes the tests” but behaves in ways nobody intended. So we moved the thinking upstream.
Every feature starts as a spec. Plain English. Not pseudocode, not user stories on index cards, not a Jira ticket with two sentences of context. A full description of what we’re building, why we’re building it, how it should behave, what it should not do, and how we’ll know it’s done. The spec gets reviewed by a team of AI agents, each with a different persona, and a human before any code gets written. Sometimes it goes through two or three review rounds. That review process is where the real engineering happens.
Once the spec is solid, we hand it to our implementation pipeline. An AI agent reads the spec, breaks it into ordered tasks, and executes them one by one. Each task produces a focused commit. The whole thing is traceable back to the spec that authorized it.
The Numbers
In the last eight weeks, we wrote 56 specs across two products. Those specs drove 181 commits, touching 869 files and producing over 56,000 lines of new code. Twenty-nine percent of those commits were generated directly by our spec-to-code workflow. The rest were human work: reviews, integration, edge cases the pipeline couldn’t handle, and the occasional “that’s not what I meant” correction.
The result is a full production platform, deployed to customers, with authentication, CI/CD, analytics workflows, a conversational AI interface, map visualizations, and API integrations. Eight weeks. A team you can count on one hand.
I want to be honest about what this is and what it isn’t. The pipeline doesn’t write perfect code. Some specs take a few days to move from draft to production. Others take a week of back and forth. The 29% automation rate means 71% is still humans thinking, reviewing, debugging, and making judgment calls. The spec doesn’t eliminate the hard work. It focuses it.
Why Specs Beat Tests as the Starting Artifact
A test tells you whether code behaves correctly. A spec tells you whether you’re building the right thing. Those are different questions, and in my experience, the second one matters more.
When we review a spec, we catch design problems before they become code. We argue about behavior, not syntax. We ask “should it work this way?” instead of “why doesn’t it work?” That shift saves days of rework because the conversation happens when changes are cheap: before anyone has written a line of code.
The spec also does triple duty. It’s the implementation guide that the pipeline follows. It’s the test plan that QA uses for manual verification. And once the feature ships, it becomes the source for our documentation. We don’t write docs after the fact because the spec already describes what the system does and why. Three artifacts for the price of one.
Tests still exist in our workflow. They just aren’t the starting point anymore. The spec comes first. The implementation comes from the spec. The tests validate the implementation. The order changed, and that changed everything downstream.
We’re Not the Only Ones
Thoughtworks put spec-driven development on their Technology Radar in 2025, calling it one of the most important practices to emerge that year. GitHub open-sourced Spec Kit. Amazon built Kiro, an IDE built around the concept. Teams using Kiro reported getting to a working product 58% faster with 65% fewer production bugs. Red Hat found that spec-driven AI coding hit over 95% accuracy.
Birgitta Boeckeler, who leads AI-assisted software delivery at Thoughtworks, put it plainly: “AI agents are highly capable at generating code, and nearly incapable of maintaining long-term coherence without structured context.”
That matches everything I’ve seen. Give an AI a spec and it builds something coherent. Give it a prompt and it builds something that compiles.
The Waterfall Question
Someone’s going to read this and think “that’s just waterfall with extra steps.” I get it. Heavy documentation before coding sounds like 2003.
The difference is speed. In waterfall, you wrote a 200-page requirements doc, threw it over the wall, and waited six months to find out you were wrong. In our workflow, I write a spec in the morning, it’s implemented by lunch, and I’m reviewing working code before the end of the day. If the spec was wrong, I fix it and rebuild. The feedback loop that made waterfall deadly is now measured in hours, not months.
Agile killed waterfall because the gap between “decide what to build” and “see if it works” was too long. AI closed that gap. Now the bottleneck isn’t writing code. It’s deciding what code to write. Specs are how we make that decision well.
What I’d Tell Someone Starting Out
Write the spec before you write anything else. It doesn’t need to be long. It needs to be specific. Describe the behavior you want, the behavior you don’t want, and how you’ll verify it. Have someone else read it before you build. That ten-minute review will save you hours.
If you’re using AI to write code and you’re not giving it structured context, you’re working harder than you need to. The spec is the cheapest, highest-value artifact in your entire pipeline. Write it first.