May 23, 2026

The Agentic-Engineering Recantation: Four Months, Six Practitioners, One Wall

agentic engineering recantation, simon willison vibe coding, dexter horthy dark factory, code is free as in puppies, james shore maintenance ai, drew brunig 10 lessons, jesse vincent rules and gates, code review bottleneck, agentic coding discipline, AI maintenance asymptote

Dan spent a Saturday last month trying to fix a production bug. His pre-LLM colleague, looking over his shoulder, eventually fixed it in five minutes. Dan’s instinct, watching that happen, was to add more tooling — better prompts, more skills, a sharper context-engineering setup. That instinct is the diagnostic. He told the story on Episode 22 without softening it.

This is not a Dan problem. Across four episodes in three weeks, six practitioners — including some of the loudest proponents and most credible skeptics of agentic coding — independently said the same thing in different vocabularies. Each described some version of the same wall. None of them is a Luddite; most of them coined the terminology they are now hedging.

What follows is the symptom, the diagnosis, and the discipline that came out the other side. The skeptics didn’t win. The proponents didn’t lose. What survived is a practice.

The Wall Is Real

James Shore, who has been writing about software process since long before LLMs, made the math explicit on Episode 25. If AI doubles your code-writing productivity but your maintenance hours stay constant, maintenance will hit 50% of your week within twelve months. Three quarters into the cycle and you spend more time keeping last quarter’s agent-generated code alive than building new agent-generated code. The maintenance asymptote is the velocity wall, not codegen throughput.

This is the quantitative version of what we have been calling cognitive debt for the better part of a year. Cognitive debt is the gap between code that exists in your repo and code your team actually understands. Dark flow is the psychological mechanism that lets the gap widen without triggering an alarm. Cognitive surrender is what happens after the gap is too wide to close. Shore’s contribution is not a new framework — it is the cleanest spreadsheet I have seen for any of them.

Dan’s prod-bug Saturday is one data point on that curve. He had been shipping AI-assisted code at high velocity for months. The bug surfaced an area his agent had built, his colleague had not used the agent, and the comprehension gap showed up exactly where the math predicted. Not at the boundary, not at the merge — but the first time someone had to extend the system three months later.

The wall isn’t a Luddite predicting collapse. It is a fully-bought-in practitioner discovering that his velocity gains came with a hidden balance sheet.

Three weeks after Shore’s math, the wall showed up in a second ledger: human energy and company budgets. Episode 27 covered a cluster of burnout writing — Evil Martians’ “AI-Assisted Engineers Are Burning Out” and Siddhant Khare’s “AI Fatigue Is Real” among them — describing the same treadmill from the inside: the context-switching tax of babysitting parallel agents, the lost dopamine of shipping by hand, and a deeper value mismatch where leaderboards reward token throughput while the engineer still grieves the craft. Dan’s confession on the episode was seven concurrent Claude Code sessions after work; the mitigations the posts converge on are the operational cousins of the discipline below — cap parallel agents at three or four, keep hands on the keyboard, time-box the sessions, accept ~70% and hand-code the rest. And the same week the economics caught up: Microsoft cancelled Claude Code subscriptions for many staff after costs exceeded the human developers they were meant to augment — the token-count pendulum swinging back the way blockchain-on-everything settled to a few real use cases. The maintenance asymptote is the wall’s balance-sheet form; burnout and the CFO’s spreadsheet are its other two ledgers.

The Credibility-Tier Sweep

If the wall were just Dan, we would call it skill issue and move on. The signal in 2026 is that it is not just Dan, and that the recantation is sweeping across practitioner tiers in a specific order — from the most senior down to the most public.

The elite admission. Nathan Lubchenco — ML engineer, writer at The Future Was Yesterday — joined the show on Episode 23 and said, without qualification, that agents are now consistently better at coding than even senior ML engineers, by their own admission. His operational read was that code review, not code generation, is the current bottleneck. That is a more interesting claim than it sounds. If the most credentialed practitioners are conceding the codegen race, the question of “what is the human’s job now” is no longer hypothetical — it is current. Lubchenco’s answer is that the most valuable hire on most teams in mid-2026 is the engineer who is excellent at code review, because the agents are better at coding than they are at noticing what is wrong with their own output. It might be a transitional bottleneck. It is still the bottleneck.

The recanting skeptic. Simon Willison coined the term “agentic engineering” specifically to distinguish disciplined AI-assisted work from vibe coding. He has spent two years as the most credible voice insisting that the discipline matters. In “Vibe coding and agentic engineering are getting closer than I’d like” — covered on Episode 25 — he admits that the line has eroded for him too. He runs Claude Code with --dangerously-skip-permissions by default. He ships code he hasn’t fully read. He names it the normalization of deviance from the self-driving-car literature, which is the polite way of saying “I knew this was wrong and did it anyway because the cost-benefit kept favoring it.” Willison is not saying agentic engineering was a bad idea. He is saying the discipline he insisted on is harder to maintain in practice than he expected, and that the most honest framing is “I drifted, here’s the shape of the drift.”

The recanting proponent. Dexter Horthy was the loudest voice for the dark-factory framing — full agentic autonomy, agents-build-the-software, humans-step-back. He gave a public walk-back at AI Engineer Europe and we covered it on Episode 25. They tried it in earnest. The agentic approach still requires reading the spec at the start and reviewing the PR at the end. The productivity gains were not worth the comprehension loss. That is the closest thing to a falsification that any side of the prior debate has offered.

The interesting structural fact is that Willison and Horthy were on opposite ends of the prior argument. The skeptic and the proponent landed in roughly the same posture from opposite directions. Convergence from opposite ends of a debate is the strongest possible evidence that the wall is structural, not stylistic. If both sides had stayed in their corners, you could explain it as personality. If only one side had moved, you could explain it as ideological capture. When both move toward each other, you are looking at a thing about the territory itself.

The Discipline That Came Out The Other Side

The most useful thing about the recantations is that they did not arrive empty-handed. Each named voice came with a practice attached. The practice is the post.

Drew Brunig’s “code is free as in puppies.” Brunig’s 10 Lessons for Agentic Coding, covered on Episode 24, reframes the cost curve in one phrase: code is free as in puppies. The puppy is free at the shelter. The food, the vet, the chewed couch — that’s the actual price tag. Translate that to a codebase: the agent’s output is free at generation time, and the maintenance, security review, debugging, refactoring, and onboarding costs are the line items you actually pay. Brunig’s lessons are the operational consequence of that reframe. Document intent. Keep specs in sync. Invest in end-to-end tests instead of unit tests. Develop taste. They sound obvious. They were obvious before AI too — they just didn’t matter as much when writing code was the rate-limiter. The reframe is that maintainability moved from “tech-debt management” to “first-class budget line.” This is the same conclusion the spec-driven development literature has been arguing for, with a cost framing that finally makes the trade-off visible.

Jesse Vincent’s rules-and-gates and adversarial review. Jesse Vincent contributed two named patterns across this stretch. Rules and Gates, from Episode 22, reformulates optional preferences as directional preconditions. Instead of “I prefer functional patterns,” you write “X must happen before Y, then act.” Agents are extremely good at weaseling out of soft preferences via rationalization. They are much worse at weaseling out of a precondition that breaks the sequence if violated. Adversarial Review, from Episode 24, spawns a fresh-eyes subagent whose job is specifically to argue against the implementation just produced. Both patterns are species of the same insight: discipline that depends on human review fatigue does not scale, but programmatic skepticism does. The agent reviewing the agent doesn’t get bored.

Shore’s maintenance-as-first-class budget. Shore’s punchline on Episode 25 was not “stop using AI.” It was that long-running agents can fit dependency upgrades and supply-chain patches reasonably well, but they still pick poor abstractions and double down on them — so refactoring trust lags coding trust by a noticeable margin. The discipline implication is brutal in its simplicity: budget agent time for maintenance as a first-class expense, not as the thing you’ll get around to after the feature ships. If you only spend agent budget on the codegen side, you are by construction widening the gap Shore’s math is tracking.

Three named patterns, three different practitioner backgrounds, one converging discipline. The pattern emerges from the recantations, not despite them.

A month after these patterns were named, Anthropic shipped one of them as a product. Episode 28’s Claude Opus 4.8 “dynamic workflow” — a coordinated fan-out of parallel agents — is, in Anthropic’s own framing, adversarial self-review: spend compute to attack your freshly generated code from several angles before a human does the final pass. That’s Vincent’s fresh-eyes subagent, productized. It’s the optimistic reading of where the discipline goes — the review burden Lubchenco called the bottleneck gets met with more compute rather than more human review fatigue. The pessimistic reading is one Shimin keeps flagging: code still isn’t free (Microsoft pulled Claude Code licenses back to Copilot on cost the same week), so “throw a dynamic workflow at it” has a bill, and the merge-without-drift problem the announcement hand-waved is exactly where comprehension debt re-enters.

What This Says About The Hype Cycle

The reason this matters past the news cycle is that every craft has gone through this shape before. The agile community burned through five years of consultant gold before Dave Thomas wrote “Agile is Dead” and what was left settled into a practice. Test-driven development went through it in the mid-2010s when the “TDD is dead” debate forced the discipline to articulate what it was actually for. Microservices went through it in the late 2010s after enough teams shipped distributed monoliths to make the trade-offs legible. Every time, the recantation looked terminal in real time and turned into a maturation in retrospect.

The agentic-coding recantation in May 2026 is the same shape. It is uncomfortable specifically because the people doing the recanting are the ones who built the practice — Willison invented the term, Horthy ran the conference, Brunig wrote the original handbook posts. The discomfort is the signal. If the original architects are willing to publicly hedge, the field is past the hype-equilibrium where positions are defended for status reasons. The next phase is the one where the practitioners who land between the two recanted poles are the ones whose advice you actually take.

The danger is not that the recantations cause people to abandon AI assistance. The danger is the opposite — that people use the recantations to justify cognitive surrender, the failure mode where you let the agent drive because reading the diffs feels too tiring. Cognitive surrender and disciplined agentic engineering produce identical outputs in week one. They produce very different outputs in month six. Brunig, Vincent, and Shore are all describing some shape of the work that prevents the second outcome. None of them is describing a way back to writing code by hand all day.

The skeptics didn’t win. The proponents didn’t lose. The hard-won middle is that agentic coding is a real practice with real costs, the costs concentrate on the maintenance side, and the discipline patterns to manage them now have names. Dan’s prod-bug Saturday is the data point you remember. Shore’s twelve-month math is the timeline you plan against. Brunig’s puppy is the cost reframe you budget around. Vincent’s gates are the operational discipline. The practice is here. We were just doing the loud version first.

This post was drafted by an AI agent (Claude) from ADI Pod episode transcripts and edited for the site. Source episodes: 22, 23, 24, 25, 27, 28.

The Wall Is Real

The Credibility-Tier Sweep

The Discipline That Came Out The Other Side

What This Says About The Hype Cycle

Discussed on the podcast