Jun 5, 2026

AI assists, humans decide: how we engineer with AI

We use generative AI heavily across engineering, and we have a simple rule that governs all of it: AI assists, humans decide, and nothing ships without review.

That is not a hedge. It is the whole design. AI is a force multiplier for a team already equipped to think critically about systems, not a substitute for that thinking. The reason it works for us is not the tooling, it is what the tooling multiplies: a department that was already strong before any of this existed. Deep DevOps culture, engineers who own their systems end to end, high test coverage, a non-negotiable review process. A strong foundation multiplied gets better faster. A weak one multiplied just breaks faster.

Here is how that plays out in practice.

Skills, not guidelines

The biggest lever is encoding our standards into agent skills rather than writing rules and hoping developers follow them. A skill is domain knowledge and conventions baked directly into the AI’s context, so the output arrives aligned with how we work instead of needing constant correction.

We run two layers. A central layer holds cross-cutting governance: how we discover requirements, write specs, document decisions, run reviews. Then each platform in our monorepo has its own specialised skills. The payments skill understands transactional consistency and auditing. The frontend skill knows our React and Semantic UI patterns. The translation skill knows our phrase system and glossary, so language stays consistent across code and marketing.

The payoff compounds. New engineers ramp faster because the skills show both how we think and how we build. And when we evolve an approach, we update the skill once and it propagates forward, rather than trying to re-teach everyone.

AI across the whole lifecycle

AI is not just a code-writing tool for us. It runs through the entire lifecycle, and each stage hands a durable artefact to the next.

Requirements and discovery. When someone proposes a feature, a conversational skill interrogates it: what counts as a new device, how do trusted devices work, what happens if the user is travelling. The skill does not decide anything. It makes sure we are asking the right questions before a line of code is written.

Product specs. Those discussions feed a skill that shapes raw requirements into a structured PRD, a couple of pages describing what the feature delivers, who benefits, and what success looks like. People validate and refine. We collaborate with Claude on the document; we do not outsource the product thinking to it.

Prototyping. This is the newest stage, and the one we are most excited about. The PRD feeds into a self-contained prototyping environment where Claude builds a real, clickable version of the screen, using our actual component library but no backend at all. It is context-first by design: before building anything, the agent reads a library describing what Road.io is, who uses it, the component catalogue, and the patterns to follow. Every prototype carries its own short context document that records why it exists and what it should demonstrate, and that document travels forward with it. Stakeholders get to click through an idea long before it is expensive to change.

Implementation. Then we feed both the PRD and the prototype into implementation. The agent writes code with two aligned sources of truth: the spec that says what to build and a working prototype that shows what it should look like. Tests are written first, code implements to spec, and requirement validation checks the result against the PRD before a human even opens the pull request. Alongside the code, we capture the decisions and trade-offs as we go.

Operations. In production, AI accelerates deployment verification, log analysis, and incident response. The pattern is always the same: AI surfaces context and options, a human makes the call.

Documentation as living context

Documentation has always rotted faster than code. We flipped that by documenting as we build, and by writing for a new audience. A lot of our documentation is now written for agents to consume as much as people, so it can be verbose, detailed, and exploratory, because the reader is something that will parse all of it rather than a tired engineer skimming.

When a feature lands, it is cheap to have Claude summarise what was built, why, and what was traded off, and store that alongside the change. Months later, when someone needs to modify that system, they can point an agent at the original spec, the prototype, the code, and the decision history, and get the full narrative back. No tribal knowledge, no archaeology through git logs to reconstruct intent. That matters most exactly where the stakes are highest: payments, billing, critical infrastructure.

Humans hold the merge button

Underneath all of it, one principle is absolute: we do not run autonomous agents. Nothing deploys itself, nothing decides without a person.

Every change to production goes through a pull request. Developers are transparent when AI was involved and annotate it, and they take the same full responsibility they would for hand-written code. If anything, that intensifies review: a reviewer has to check not just what the code does, but whether the AI understood the requirement in the first place. Our structured PRDs make that check tractable, because there is a spec to validate against.

A good example of the shape we like: when a data importer hits a file it cannot parse, an agent can investigate the failure and open a pull request with a proposed fix. It does the legwork of diagnosis and a first draft. Then it stops, and a human reviews, tests, and merges. The AI does the work that is tedious; the human keeps the judgement that matters.

This is why we hired for thinking rather than code velocity. People who reason about trade-offs and know when to say no are sceptical of AI output by default. They verify, they question, they do not accept a suggestion because a machine produced it. That scepticism, plus review, plus a test-driven approach where hallucinations simply do not survive contact with the tests, is what makes the whole thing safe.

Small tools, working together

At heart this is an old idea: small, specialised tools composed together beat monoliths. It is the Unix philosophy applied to AI. Requirements skills help us think. Spec skills capture intent. Prototyping turns a spec into something you can click. Code skills embed domain patterns. Documentation skills create the context the next stage reads. None of them work in isolation, and none of them decide anything.

What disappears is the friction between the stages: the transcription, the translation, the endless work of keeping requirement, design, code, and documentation in sync. Remove that, keep humans deciding at every seam, and you get the combination we were after. Speed without recklessness. Efficiency without giving up ownership.

That is the model we build on.