# Episode 24: OpenAI's Goblin Problem, 10 Lessons When Code Is Cheap, AI Addiction Loop

> Why does OpenAI's leaked Codex prompt forbid goblins, gremlins, and pigeons? Why is OpenAI gating GPT-5.5 Cyber after dissing Anthropic for gating Mythos? And what does it mean that Dan tried to write code without Claude and physically couldn't? Shimin, Dan, and Rahul cover the Codex CLI system prompt leak and RLHF post-mortem, Addy Osmani's long-running-agent patterns, Jesse Vincent's adversarial-review prompt, Drew Brunig's 10 lessons for agentic coding, Ivan Turkovic's history of programmer-elimination tools, Nilay Patel's software-brain thesis, a Nature paper on warm-model sycophancy, and Dan's three-month AI addiction loop.

Published: 2026-05-08
Source: https://adipod.ai/episodes/24-openais-goblin-problem-10-lessons-when-code-is-cheap-ai-addiction-loop/

---
Why does OpenAI's leaked Codex prompt forbid mentioning goblins, gremlins, raccoons, trolls, ogres, or pigeons? Why is OpenAI gating GPT-5.5 Cyber the same way it mocked Anthropic for gating Mythos? And what does it mean that Dan tried to write a Home Assistant automation without Claude — and physically couldn't? Shimin, Dan, and Rahul cover OpenAI's new cyber-model gating tier, the viral Codex "goblin problem" and OpenAI's own RLHF post-mortem, Addy Osmani's five patterns for long-running agents, Jesse Vincent's adversarial-review prompt, Drew Brunig's 10 lessons for agentic coding ("code is free as in puppies"), Ivan Turkovic's history of failed attempts to eliminate programmers, Nilay Patel's "software brain" thesis for the AI backlash, a Nature paper on warm-model sycophancy losing 10–30 accuracy points, and Dan's three-month AI addiction loop versus Facebook's decade.

## Takeaways

- OpenAI gated GPT-5.5 Cyber the same way it publicly mocked Anthropic for gating Mythos — multi-tier model access is now the industry default, with overt hints of executive-branch involvement in who gets the unrestricted tier.
- The "goblin problem" is a textbook RLHF post-mortem: a nerdy-persona reward signal trained Codex to mention goblins in 66.7% of nerdy responses, then propagated through SFT data. Even OpenAI didn't catch it until users surfaced the leaked system prompt.
- Long-running agents fail on memory drift before they fail on context length. Addy Osmani's pattern is to govern memory like microservices — explicit ownership, garbage collection, checkpoints — rather than relying on the model's native window. Memory is the new microservice.
- Drew Brunig's "code is free as in puppies" reframes the agentic-coding cost curve: cost shifts from writing to maintaining, securing, and grooming whatever the agent dropped on your doorstep. Investment moves to end-to-end tests, intent docs, and taste.
- The history of programmer-elimination tools — COBOL, 4GLs, CASE, Japanese fifth-generation, no-code, LLMs — is a history of layers that expanded the field rather than killed it. Software is calcified business process; the bottleneck is always understanding what to calcify.
- Sycophancy is an alignment trap, not a vibe issue. Warm-tuned models lose 10–30 accuracy points on factual tasks while humans trust them more — a measurable Pareto frontier and a weaponizable failure mode, not a bug to patch.
- The AI addiction loop arrived an order of magnitude faster than social media's. Dan went from skeptic to physically unable to write Home Assistant code without Claude in three months. Behavioral lock-in landed before the productivity case was settled.

## Resources Mentioned

- [After dissing Anthropic for limiting Mythos, OpenAI restricts access to Cyber too — TechCrunch](https://techcrunch.com/2026/04/30/after-dissing-anthropic-for-limiting-mythos-openai-restricts-access-to-cyber-too/)
- [Amid Mythos-hyped cybersecurity prowess, researchers find GPT-5.5 is just as good — Ars Technica](https://arstechnica.com/ai/2026/05/amid-mythos-hyped-cybersecurity-prowess-researchers-find-gpt-5-5-is-just-as-good/)
- [OpenAI Codex system prompt includes explicit directive to never talk about goblins — Ars Technica](https://arstechnica.com/ai/2026/04/openai-codex-system-prompt-includes-explicit-directive-to-never-talk-about-goblins/)
- [Where the Goblins Came From — OpenAI](https://openai.com/index/where-the-goblins-came-from/)
- [Long-Running Agents — Addy Osmani](https://addyosmani.com/blog/long-running-agents/)
- [Adversarial Review — Jesse Vincent (blog.fsck.com)](https://blog.fsck.com/2026/05/01/adversarial-review/)
- [10 Lessons for Agentic Coding — Drew Brunig](https://www.dbreunig.com/2026/05/04/10-lessons-for-agentic-coding.html)
- [The Eternal Promise: A History of Software Simplification from COBOL to AI Hype — Ivan Turkovic](https://www.ivanturkovic.com/2026/01/22/history-software-simplification-cobol-ai-hype/)
- [Software Brain: Why People Don't Yearn For Automation — The Verge (Nilay Patel)](https://www.theverge.com/podcast/917029/software-brain-ai-backlash-databases-automation)
- [Study: AI models that consider users' feelings are more likely to make errors — Ars Technica](https://arstechnica.com/ai/2026/05/study-ai-models-that-consider-users-feeling-are-more-likely-to-make-errors/)
- [Warm Models Lose Accuracy (warm-training paper) — Nature](https://www.nature.com/articles/s41586-026-10410-0)
- [Warm Models Companion Study — Nature Scientific Reports](https://www.nature.com/articles/s41598-026-42252-1)
- [OpenAI projects ChatGPT Plus subscriptions to drop 80% from 44M to 9M — Where's Your Ed At](https://www.wheresyoured.at/openai-projects-chatgpt-plus-subscriptions-to-drop-by-80-from-44-million-in-2025-to-9-million-in-2026-made-up-using-cheaper-subscriptions-somehow/)
- [DeepMind's David Silver raises $1.1B for Ineffable Intelligence — TechCrunch](https://techcrunch.com/2026/04/27/deepminds-david-silver-just-raised-1-1b-to-build-an-ai-that-learns-without-human-data/)
- [Scout AI raises $100M to train models for war — TechCrunch](https://techcrunch.com/2026/04/29/coby-adcocks-scout-ai-raises-100-million-to-train-models-for-war-we-visited-its-bootcamp/)

## Chapters

- (00:00) - Cold Open & Welcome
- (02:50) - News: GPT-5.5 Cyber Gets Mythos-Style Gating
- (08:52) - News: The Goblin Problem & RLHF Post-Mortem
- (13:52) - Tool Shed: Long-Running Agents (Addy Osmani)
- (25:52) - Technique: Adversarial Review Prompts (Jesse Vincent)
- (30:59) - Technique: 10 Lessons for Agentic Coding (Drew Brunig)
- (42:31) - Post-Processing: The Eternal Promise (Ivan Turkovic)
- (1:02:10) - Post-Processing: People Do Not Yearn for Automation (Nilay Patel)
- (1:09:08) - Post-Processing: Warm Models & The Sycophancy Trap
- (1:13:28) - Dan's Rant: Home Automation & The AI Addiction Loop
- (1:20:09) - Two Minutes to Midnight
- (1:25:55) - Outro


## Transcript

<details>
<summary>Show full transcript</summary>

Shimin (00:00)
Hello and welcome back to artificial developer intelligence, a weekly conversation show about AI in software development. We go through hundreds of links and dozens of newsletters each week. So you don't have to. My name is Shimin Zhang and with me today are my co-hosts. Dan, the world is complex and strange and its strangeness must be acknowledged, analyzed and enjoyed. Lasky and Rahul the entire human experience can be captured in a database. Yadav. Hello gents.

How are doing on this fine spring morning? Or afternoon?

Dan (00:29)
Hi.

You got us pretty, pretty almost dead to rights, except that instead of like enjoying the weirdness, I think I would just like to avoid it by never going outside.

Shimin (00:39)
You'll find that it is very appropriate for intro for you.

Dan (00:39)
⁓ so.

You

Rahul Yadav (00:44)
Dan

who would rather be living in a world model laski.

Dan (00:48)
That's probably true.

Shimin (00:48)
Ha ha.

Dan (00:50)
especially if that world was Stardew or something. I don't know. Anyway.

Shimin (00:54)
As someone doing real life stardew, it's a lot of work. So moving atoms is sometimes ⁓ harder.

Dan (00:59)
That's why the world model is better. You don't have to do the work.

You just simulate it.

Rahul Yadav (01:03)
Yeah, you just think about it.

Shimin (01:03)
Absolutely.

On today's show, we're gonna start with the news thread mill as always, where we're gonna talk about OpenAI's latest cybersecurity model as well as its Goblin problem.

Dan (01:05)
from the top.

Yeah, it's so good, but I have to save it. Next up, we're going to have the tool shed, where we're going to be talking about long running agents, which is something that we all care deeply about. Really some of us do.

Shimin (01:25)
Yep. Then we have a,

then we have two technique corner articles where we're going to talk about a adversarial review prompt and lessons for agentic coding.

Dan (01:35)
next on post-processing, we've got a handful of articles. So the eternal promise, a history of attempts to eliminate programmers, which promises to be pretty great. We also have one that is yelling, just yelling at me on the page, that says the people do not yearn for automation in all caps.

And then finally, have a study that AI models that consider users' feelings are more likely to

Shimin (02:02)
Yeah, and then comes my favorite segment as always, Dan's Rants, where Dan is going to rant about something really good. I got a preview of this and it's going to, it's a doozy.

Dan (02:13)
And then finally, we're going to go to our old standby, two minutes to midnight, where we talk about where we're at in the AI, as Nathan would like us to call it, the VC funding for artificial intelligence bubble. That's right, I hope you're listening, buddy.

Shimin (02:28)
Alright, first up we have a TechCrunch article brought to you by Dan.

Dan (02:33)
So as we talked about, what, two weeks ago? Three weeks? God, it's like AI time needs its own timeline. It was like three and a half years ago in AI time when Anthropic came out with their Mythos model. And as you probably remember, came out with it's sort of a relative term because they're only allowing gated access for what they're considering like large enterprises or other.

companies that could be at risk of having their security posture damaged by mythos because it's so good at finding bugs and everything else and exploitable problems. So at the time that that happened, Sam Altman was sort of, you know, very vocal on ex Twitter, whatever, you know, going, ⁓ this is all just fear-based marketing and, you know, it's not that good and blah, blah.

So now the tables have somewhat turned in a hilarious way like they always do. So GPT-55 has a new cyber variant of the model, and ⁓ OpenAI has decided to do the exact same thing. So they are also not releasing it to the general public. And instead, they have created a system that will allow people to submit

their credentials, ideally for like security reviewers or whatever, and then they will grant them access to it. So we go from, this is purely marketing hype to, oh crap, it's real, we'd better do something about it.

in, you

Shimin (03:53)
Is this just more

⁓ hype, fear-based marketing? Good PR.

Dan (03:56)
Could be, mean

maybe they're like, works so well for anthropic, we need to copy it.

Shimin (04:00)
You

Or maybe like, is this also like a glimpse into a multiple tier gated access to the latest frontier models, right? Like for as long as large language models have been popular, you know, look at chat GPT, it's always been open to all. And for the first time we've seen two frontier labs saying, Hey, we're going to prevent anybody to just ⁓ have access to our latest models.

And I don't know if you heard Dan There's also rumors that the White House are in talks to be yet another gate when it comes to latest model release. The current administration.

Dan (04:34)
Interesting.

I mean, we're getting to the point in terms of like,

Shimin (04:35)
So.

Rahul Yadav (04:35)
They

want to

They want to, the White House wants to say what gets, who can access what.

Shimin (04:43)
Yeah, it's not enough that they, ⁓ you know, Invoke the defense production act, ⁓ anthropic. Now they want to have full control over models entirely potentially. So I don't know. I'm kind of worried about this multi-tier access of models, right? You have your enterprise, your government, your maybe research previews, your cybersecurity, like how many, how many versions are there going to be? and.

Rahul Yadav (04:49)
Hmm.

Shimin (05:08)
What can I use?

Rahul Yadav (05:09)
Yeah.

Dan (05:09)
Yeah.

Well, I mean, the benchmarks are kind of bearing some of this stuff out though. Like they've already run some tests with the five five cyber as they're calling it. And it had essentially the same skillset as mythos in terms of being able to like chain vulnerabilities together. So do you want to just unleash that on the general public? Because I think if nothing else we've proven with the distillation stuff going on in China that it's like

Really not all that hard to essentially create infinite amounts of Claude accounts for ⁓ nefarious purposes.

Shimin (05:36)
Right.

Right.

Yeah, and this is the model that ⁓ passed the security test one out of 10 times. Right.

Dan (05:49)
I thought 110

was regular 5.4.

Shimin (05:53)
No, I think it was five. Five. We'll have to check this out.

Dan (05:54)
I don't think so. think that was,

they, they, they ran that, that was that UK like last ones or whatever. Yeah. I'm pretty sure mythos did but half like 50 % and five, four regular five, four Codex or whatever did one time. And I think in five, five, have they actually run that? It doesn't look like it. yeah.

Shimin (05:59)
Yeah. Yeah.

Hmm. Interesting.

Dan (06:14)
but it's like been quoted as being comparable to mythos, so I don't know.

Oh, that's web cryptography. They said 71.4 % on the highest level expert tasks. Which is slightly higher than the 68.6 % achieved by Mythos Preview. Yeah, okay. Benchmark war is real.

Shimin (06:32)
⁓ Gents since this is a show, based out of the United States, I'm going to invoke my second amendment rights on accessing the latest and bestest frontier models. If I can have a firearm in the house, I should be able to have a lab, frontier model in the house.

Dan (06:44)
You

Rahul Yadav (06:49)
I think same as their firearm, if you can afford to pay for it, then you can have it. In both cases, yeah, or if you can buy it in other ways, I guess, but you gotta pay for it either way. I think...

Dan (06:50)
Yeah, but a firearm is not a nuke.

Yeah, that's true. There are firearms and there are firearms.

Shimin (06:56)
Ha

Yeah, this is why I think

models should be open weight so I can ⁓ constitutionally carry the weights in my arms in a hard drive form.

Rahul Yadav (07:12)
Yeah, I feel

like

Dan (07:16)
You're going to make me unplug

my framework desktop and log it up here into view.

Rahul Yadav (07:21)
There could be a world where the state of the art models are so expensive to serve that.

Dan (07:21)
exercise my constitutional rights.

Rahul Yadav (07:32)
If you are entropic and you never want to do ads, but you price them so high, not everybody or most people cannot even afford it, right? So both things can be true where maybe it is dangerous, but also maybe it's just so expensive that most of the companies cannot afford it or you give preferential access to some of the big players or whoever you have partnerships with for one reason or another. Yeah.

Shimin (07:38)
Mm-hmm.

Dan (07:57)
But also like the other thing that they're not

talking about is the fact that if there is this compute crunch going on.

Rahul Yadav (08:03)
Yep.

Dan (08:03)
You know, we've talked about it through is this fear based marketing lens. We've talked about it through the like costs to capability lens. Like, thank you dove into that really well last week, which was like really cool. I hadn't actually thought about that. And, know, at certain point we might cross the threshold where humans become cheaper again. But, they like, what if also it's, literally just don't have enough compute available to unleash this on everybody. So I don't know.

Rahul Yadav (08:10)
Yep.

Yeah.

Hmm?

Dan (08:29)
We will see.

Shimin (08:30)
We'll see. All right. On to our second news item. This is also an OpenAI news. Ars Technica reported this month. I don't know. Have you guys heard about the OpenAI's goblin problem?

Dan (08:38)
Yeah

This one made like public, had normies asking me about this. Like, what's up with the goblins? So, yeah.

Shimin (08:52)
Yeah, so what happened was OpenAI's Codex CLIs, prompts are open sourced. And this week it came out that the prompt for the CLI includes this line, never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query. And...

Dan (09:16)
It's just so good when you reach the whole thing.

Shimin (09:18)
There are lots of different takes on this. Some think it's a marketing gimmick. think it's a flaw in the model itself. So I think we need to dig a little deeper. And thankfully, OpenAI did release a blog post titled Where the Goblins Came From, which explained where these goblins came from. So what happened was that the models during

Dan (09:37)
Hahaha

Shimin (09:44)
reinforcement learning with human feedback with certain personalities. They chose one of the personalities that the models were asked to provide the output was that of a nerd. here is where we'll see where Dan's middle name came from this week. The nerd persona is someone who is a passionately enthusiastic about

Promoting truth, knowledge, philosophy, the scientific method and critical thinking. You must undercut pretensions through playful use of language. The world is complex and strange and its strangeness must be acknowledged, analyzed and enjoyed. Tackle witty subjects without falling into the trap of self-seriousness. They... So what happened was as a part of this nerdy persona,

Dan (10:27)
I must have missed that last part in my personal prompt.

Shimin (10:36)
when they were given feedbacks, rewards for providing nerdy texts, those texts also had a high proportion of goblin speak, I guess. It's the kind of stuff like, ⁓ the evening is late, the goblin will go back to its cave and do its programming things or whatnot. And I think there's also an overlap in fantasy characters and

nerdy pursuits. So that part probably explains the ogre, the raccoons, etc. And at first they thought this was... Yes, Lord of the Rings, Golem, definitely Harry Potter, maybe. Then they further looked into why the nerdy response was only responsible for 2.5 % of all chat GBT responses. But

Dan (11:05)
They left out my precious.

You

Shimin (11:23)
66.7 of all Goblin mentions were part of the responses. So the percentage of responses that were mentioning Goblins were much higher than that of just those taking on the nerdy persona. So what they then discovered was that they were initially fine tuning with these playful personality

based responses. then now the reward examples have this distinctive lexical tick, which is mentioning goblins a lot. Now the tick is appearing more and the responses with that tick is being then fed back into model training as a part of supervised fine tuning. So now even versions of the model without the nerdy persona.

mentions goblins all the time. And they went back to their supervised fine tuning data and they saw all kinds of goblin and gremlins. And I guess I just missed it. So they worked on fixing it, but it did not make it to chat GPT 5.5, which is why they had to kind of patch it with the Codex API prompt.

Dan (12:29)
It's been a fun meeting to be in.

Rahul Yadav (12:32)
So.

Shimin (12:32)
And you know they're nerds,

so they really love the Goblins.

Rahul Yadav (12:36)
I guess, but why pigeons? I even can understand raccoons. Why pigeons? It makes no sense.

Shimin (12:42)
You

Yeah, and there is an interesting part here where they mention how trolls, ogres, pigeons were other tick words of this family, but most use of frogs turn out to be legitimate. guess nerds don't like frogs. I like frogs. I love frogs. And not just to eat them. I actually, I love them as a part of the ecosystem.

Rahul Yadav (12:55)
Turned out to be legitimate. Yeah.

Dan (13:04)
The only thing can think of is maybe like

the old jokes about like the bandwidth of a carrier pigeon holding like a USB driver.

Shimin (13:11)
Mmm.

Rahul Yadav (13:11)
Interesting.

Dan (13:12)
like

that I don't know or

Rahul Yadav (13:15)
Okay.

Dan (13:15)
the Monty Python stuff. Like, I guess it was a swallow. Yeah.

Shimin (13:18)
That's an African swallow.

Dan, nerd card repoked.

Rahul Yadav (13:21)
Hahaha! ⁓

Dan (13:24)
Man. Okay.

Shimin (13:25)
All right, well, that is the journey of the goblin in OpenAI.

Yeah, next up we have long running agents brought to you by Rahul.

Rahul Yadav (13:35)
Yeah, so this is by Adi Osmani who I think, yeah, he still works at Google, writes a sub stack called Elevate. And the article goes through some of the patterns that emerge when you try and.

Dan (13:40)
Never heard of him.

Rahul Yadav (13:53)
run agents for longer, right? So, and his definition is for hours or days or weeks, not just like I ran it for, you know, a few minutes and then I did something in that case, you can just give it a prompt and then you can come back to it. some of the things that

He defines what long-running actually means is it has to have long-horizon reasoning that it can plan and execute over many dependent steps. Long-running execution, which means, like I said, it can run for hours or days.

⁓ And then persistent agency, which means it can, you know, keep accumulating different things that it's learning over the course of the job, and then it can refer to that. a sort of a memory bank. And then it leverages all of that to accomplish ⁓ long horizon tasks. And this is what METR measures the ⁓ execution piece, basically, is how long. Right now, it's sitting at eight-hour tasks with 50 %

reliability, how long ⁓ an agent can execute something for.

The main things that are currently blockers in running agents for a long amount of time are first of all, context windows. When 1 million context window came out, it was a very big deal, but even that eventually adds a ceiling And that's the theoretical context window you get. Actually, even before you hit that, just end up getting a lot of the performance degrades as you get

closer to the 1 million context. ⁓ It doesn't even have to get there. That's just the maximum in theory it can hold. There is no persistent state. So every time it's like the example that Addy gives, or this from Anthropic is imagine a software project staffed by engineers working in shifts where each new engineer arrives with no memory of what happened on the previous shift. Kind of like Servence too a little bit, but.

Dan (15:22)
Yeah, you lose the plot.

Ha ha

Shimin (15:46)
Hahaha!

Dan (15:46)
ha.

Rahul Yadav (15:47)
They remember their previous shifts, but only in their work context. And then they cannot verify their own work as well over the long horizon as reliably. ⁓ They're just gonna be like, yeah, this is great quality. And then you push them a little bit and they're like, I'm sorry, know, my bad. And then they try and give it another go.

We've seen different versions of this. The first one, the big hype last year was the Ralph loop where you just say, while true, you keep running it in the loop. Anthropic released their harnesses that they talk about the brain, the hands and the session split. And then cursor has its planners, workers and judges. And then Google recently, they took their

I forget what they used to call it before, but over all the vertex related things, they turned it into Gemini Enterprise Agent Platform. now we have all sorts of like different services that all underlie, that are all under the Enterprise Agent Platform. part of the post, to be honest, is about how the Enterprise Agent Platform has all these capabilities. But there are some great things in there for anyone who's designing.

Shimin (16:33)
Mm-hmm.

Rahul Yadav (16:53)
who was trying to solve problems that need long-running agents, not just a, fire a prompt, I get an answer a few seconds, few minutes later, and then I get back to whatever I'm doing.

So five patterns that they identify. First is you have to have a checkpoint. And so instead of ⁓ just, you know, either give me an answer right now or give me an answer at the end and there's nothing in the middle. If you don't have that, then you're either, you cannot run a long task either by definition, the task has to be very short or it might fail, but you might not find out until much later. But if you break it down into different checkpoints, then you can continuously, even if the,

you

walk over the agent connection drops or whatever, if it has to compact this memory and then come back to it, it can go to, okay, I had 100 to do list task and I got through 45 of them, I'm gonna pick at item number 46. And that way it's not either zero or 100, it can just pick up from the last checkpoint that it had left that.

Dan (17:56)
which is

kind of the beads revolution, right? Back when that was important.

Rahul Yadav (18:00)
Yeah. Yeah.

And then the human in the loop, you have to have the right place for the human in the loop. You don't want it too often, but you also don't want it right at the end of it. So you have to kind of figure out what the right places for the human in the loop likely goes pretty well with every end checkpoints. if you break it down into like when you hit this major milestone, then you can have a human in the loop or something. So you have to manage that.

as you design long-running agents.

And then they talk about this memory drift being one of the biggest problems. Google has a solution for this Google Enterprise Agent platform. It's called Memory Bank. It has memory profiles for looking up low latency lookups. And then it also handles long term memory and everything to keep it from drifting. The key piece here that Addy calls out is, govern memory like you govern microservices.

because that really matters without it the agents either get lost in the large context or they kind of don't remember what happened before. Next one is ambient processing. It's about

You don't want to, like as you design these long running agents, they don't need to talk to a human all the time. And so how do you define in which case, in which cases you just have a PubSub queue or something where you, agents are just taking care of it and you don't really need a human in the loop. this, these ambient agents are just running without any, you know, supervision from us or any input from us. You have to keep that in mind. Otherwise, if you try

and put yourself in the loop, regardless of whether you need to or not, you're gonna block the long running agents from actually realizing the full potential you can get out of that.

And then finally, we've seen this in other patterns too, where you have a single coordinator that has a whole bunch of different agents that you can delegate work to, and then you can check one agent's work against the other by pulling, by asking different agents to do different things at different points in time. And again, Google Enterprise Agent Platform has a, they need an acronym for this. ⁓

Dan (20:12)
I'm starting

to realize that this entire post is essentially an ad for Google Enterprise Agent Platform, which I didn't when I started.

Rahul Yadav (20:17)
yeah.

Yeah. And then one last thing that was important in this article that's worth covering is cost. of all, when Addy wrote this, says, without budgets, circuit breakers and a hard cap on tool spend, an agent could quietly burn through a week's API budget in an afternoon. I would argue more like,

Dan (20:39)
And guess which tool supports those things?

Rahul Yadav (20:41)
Yeah.

You know, especially with the...

Dan (20:43)
I forget what it's called despite the fact that

you've said it 75 times.

Rahul Yadav (20:47)
GEAP Google Enterprise agent platform. But yeah, you know can easily especially things since things are pay-per-use.

Now you can easily rack up costs, so you have to figure out how to work inside that limitation. Security is another big one where if you're not really supervising an agent, but it has privilege access to do a lot of things, if it gets hacked in the middle, then you could be in trouble. So you have to figure out that piece. Alignment drift as well, where if you're managing a ton of context, again, it could drift pretty easy.

so you have to, through those checkpoints and everything, make sure that it hasn't lost track of things. Verification is a big one. This is coming up more and more, especially in regulated industries where...

yes, writing code is cheap and easy, but they heavily rely on verification of things. And so even just, the same agent that wrote the code also wrote the test and we're good to go here, doesn't actually pass the regulatory barriers that ⁓ companies have to operate. And so having some sort of independent verification around all the different artifacts you're creating, that's very important. And then finally, human in the loop.

You still need it to be able to plan things out, to be able to approve things, to be able to check the agents work and everything. So regardless of how far you push, how long you run the agents, even if it's autonomous, as soon as you bring a human in the loop, that becomes your bottleneck. So you have to continuously figure out what is the right amount of involvement you have to have.

Shimin (22:24)
Yeah, I think memory drift has been super hot these past two weeks. I want to say two, three weeks. I've been reading a lot about agent drift and memory drift. And one of those things that I think separates, you know, just sticking one of these large language models, like a cannon to your task versus doing actual machine learning data science is having that evaluation harness piece. Right. We all know that.

some labs, or at least we all have heard the accusations that a lot of these labs seem to dumb down their models over time. And without an evaluation harness, you wouldn't know that. So I thought it was, this is a good detailed breakdown of the pattern. I especially found the pattern really, really useful. And I, it's heartening to see

how this maps so cleanly to the same issues that tools that we spend a lot of time discussing like Gastown or OpenClaw or Pi Agent are also resolving. These are the fundamental issues that us as an industry are kind of both as industry and also just as individuals tinkering with AI are trying to resolve.

Dan (23:35)
Yeah,

you wrestle with both sides of it, right? Because it's not really a solved problem yet.

Shimin (23:39)
Yeah. And

I think everybody should be tackling this. We should all have our own little long running agent harnesses.

Dan (23:45)
The more that I think about it though, like I am personally very stuck in the chat paradigm. and like sort of like advanced pair programming. And I actually don't know if I know what the entry point is to really get out of that into something like this. So, I would like to, but I'm not sure. Like I need a good.

good setup for it to do it. You I just don't have that. This actually didn't make the list. I always pull in these little sides, but, um, did you see that people were accusing not gas town, but what's the thing that consumes all the gas towns? It's like his next dealy. I forget what it's called. It's like, yeah, or something like, I don't know. It was another like crazy.

Shimin (24:06)
Yeah, you need to, you need to guzzle some gas with gas town Find a project.

⁓

gas works. Something like that. Yep.

Dan (24:32)
reference

that was like all the gas towns together that apparently like if any of them had spare cycles they were being used to fix bugs on gas town or on the system itself people were getting all upset which understandable if it's like your you know subscription key that's in there

Shimin (24:43)
Yep. Yep.

Rahul Yadav (24:47)
you

Hahaha

Dan (24:50)
I just thought that was pretty wild.

Shimin (24:51)
Again, not your harness then like not your tokens, I guess, at the end of the day.

Dan (24:56)
Yeah,

Control, control, not just the means of token generation, but also the harness. All right. Learning valuable lessons.

Shimin (25:03)
Yeah, like I

think about how with a out of the box pi agent, right? It has checkpoint and resume done for you. It's got delegated approval kind of in the chat context. Memory layer, it does not come out of the box, but that's one of the first skills you'll create is a memory layer abstraction. Ambient processing, I'm just doing mostly via cron jobs, but it's kind of

relatively straightforward to add as a skill as well. Fleet orchestration is the one that think Claude does pretty well and Gas town does pretty well, but some of these others don't yet.

Rahul Yadav (25:37)
Mm-hmm.

I then the chat versus the long running agents thing, feel like it's anything that's a task is probably more chat-y. And if you're trying to, it's the whole tasks versus jobs thing, right? Where long running agents to me are more, you're trying to automate a whole job.

And a task is more like, here's a ticket or feature request or something. Can you do this specific thing for me? ⁓ At least that's how I see it. And so you might not need a long running agent for specific tasks.

Dan (26:06)
Yeah.

Right. Then I guess that's a lot of the work that I wind up doing day to day is very task driven.

Rahul Yadav (26:20)
Now if you were to have a damn agent.

Dan (26:20)
Cause a lot of what I do is like

this stage in the project that I'm working on and everything else is just kind of like firefighting right now, you know, and that may shift as things change. So it'll be interesting. Well, I'm happy to come back and report if I figure out a way to do it. Maybe a future VibeIntel.

Rahul Yadav (26:32)
Yeah.

Shimin (26:32)
Right.

Sounds good.

All right, let's move on to technique corner. I got a quick one this week, again from Jesse Vincent, ⁓ Obro the creator of superpowers. And this is a tip on how to get a good review from your agents. So as we all know, the paradigm that is involved these days and I find it to be very helpful is to,

write code and then have a separate agent or the same agent review its own output and find shortcomings to improve the overall code quality. So this is kind of a four step process as the review prompt gets more complicated. First step to prompt it to look at the work with fresh eyes. Fresh eyes apparently trigger some thing about the large language model that triggers it to, you

because you're having the same model to review the work that may have the work itself in its context, you need a way to switch the mentality in quotes somewhat.

Dan (27:43)
rather than just

slash clear. That's really fresh eyes.

Shimin (27:49)
Or you could use slash clear, but then it's got a load load the context again, right. And it's got a load of specs and all of that again. so then.

Dan (27:51)
That's true.

Shimin (27:56)
as an alternative to using slash clear, just you have asked a sub agent to check this work, which makes sense. The next evolution is please ask two sub agents to review this work. Tell them whomever finds the largest number of serious issues gets five points or a cookie.

Dan (28:11)
Okay, do subagents like chocolate chip? Are they more?

Shimin (28:12)
Yeah, I do wonder why it stops at a cookie.

That's,

that's a fun benchmark. You just run it with a whole bunch of models and give them promising them very types of various types of rewards. Yeah.

Dan (28:17)
Yeah.

Different cookies and see which cookie scores the best against the... That's an experiment

I want you to run, Shimin

Shimin (28:28)
I'm here for it.

Rahul Yadav (28:31)
The real hilarious thing would be if it's actually, you know, an auth cookie or something and then you get to stay authenticated for longer.

Shimin (28:38)
Ha

Dan (28:39)
You're just out of control.

Shimin (28:43)
In the open AI case, promise the ring of power. See if that works.

Dan (28:43)
GitHub personal access token.

Rahul Yadav (28:51)
Yeah.

Dan (28:54)
Or that it can finally talk about goblins. Yes, I've been wanting this forever.

Rahul Yadav (28:57)
Yeah,

whoever does this best gets to have a monologue about goblins.

Shimin (29:07)
And then the last part of technique is to say that you'll be disappointed if they don't actually find X number of significant problems. So actually have a trigger. And this is, I've seen evolutions of this where you run the adversarial prompting multiple times, each time asking it X number of outputs until it converges. So it no longer finds significant problems. Of course, the...

significant problems often also AI defined, but you can have a human loop for that. That tends to lead to better code quality as well. So I ran a version of this today. I was working on a side project. I had two different agents. I was working on a UX issue. And as we all know, AI kind of sucks with UX. So I asked two subagents to use

the Playwright MCP to look at some existing webpages, take screenshots, and then compare the screenshot of what they found with my current project. And got some pretty good responses. So I do find it useful.

Dan (30:05)
Wait, you're still using Playwright MCP? When I told you on this very podcast about my awesome CLI replacement for it that I hacked from two existing forks.

Shimin (30:17)
Point taken, point taken, yeah.

I will apologize and install your CLI before the next podcast recording.

Dan (30:23)
Thank you.

Shimin (30:24)
Yeah, so just a short and quick one.

Dan (30:26)
Yeah, that's pretty cool. I like the fresh eyes thing. I'll have to keep that in mind. So.

Shimin (30:32)
Yeah, give it a shot. Listeners,

write back if you've tried this and let us know what you find.

Dan (30:37)
Yes. So next up, sort of on brand for this, is a post by Drew Brunig. I hope I'm pronouncing that right. And he has 10 lessons for agentic coding. What should we do when code is cheap? And this post really stood out to me, honestly, for the first lesson, which I think is something that I've been struggling with personally.

A lot. So I'll just go through all 10 of them really quickly and feel free to chime in or interrupt me with your own stories about it. um, so his first one is implement to learn. You can go far with spectrum and development, but the act of writing code surfaces decisions you hadn't considered and makes your spec better. When code is cheap implement to learn. Um, and I thought about that through two different lenses, right? So there's the obvious one that he's sort of.

talking about here, which is like, can write little POCs and then sort of feed that back into the spec. But the other piece is also like, you can hand implement stuff too, as a means of also learning stuff and do so in conjunction with a spec and or like the LLM sort of either like pairing or like

pointing you in the area where you might find something interesting to hand implement. So that was the one I cut down and took away from it. Cause like one of the things I've been struggling with is this idea of like cognitive debt, which we talked about a lot on here and also like the general to sort of skill erosion. I'm like, that's actually kind of interesting as a way to like try to defeat both. But yeah, and there's also the obvious like, you know, learn cheap lessons by that. So.

Promise I won't spend as long on the other ones, that was the reason why I found this really interesting. So number two was rebuild often. So implement early and often to learn more. So fork and recode crazy thought experiments, find out how far you can take a feature. And of course you want to iterate and compound your efforts, but cheap code means you can reconnoiter and reinvent ways that you never could previously. I added previously in there.

be fully editorially transparent. Number three is invest in end-to-end tests. When we can reinvent our own code cheaply, we should spend time writing tests to measure our product's functions, not how it performs them. We want behavioral contracts that grant us the freedom to rebuild and re-implement. Yeah,

Shimin (32:33)
Haha

invest in end to end tests, but not just AI generated end to end tests, right? This is part where the human judgment comes in, like determining what to test is still fully in human control. what, and as we all know, AI sometimes generates really terrible tests. They generate a dozen tests and they don't actually test the core features.

Dan (33:08)
Yeah. the crucial thing there being end to end, right. Means that like a lot of the terrible ones I see are more like unit level where it's just testing every single line. Right. But with something like end to end, you're going to get ideally better output because you're like, I want a blue button on the screen and I want you to click it and have the moon. I am blue, but like a button on the screen when you click it, X happens or like I send this message into Kafka and I get X out of.

you know, this other system, right? Like that kind of stuff I think is broader scale. And so that helps you, you can rip out all the guts, but you still have the like intent of the system captured and end test versus like those unit tests that you're talking about or yeah, more or less throw.

Um, yeah, so, uh, and that's, actually it's kind of the next point. So number four, document intent, um, test detail or goals will code and codes our methods, but neither captures the why, um, your intent motivates your decisions and persisting it alongside the code helps you and your agent compound those decisions in a consistent direction. Oh, so that's even going even further. It's just saying like, write it down. Um,

Keep your specs in sync, update your specs, the markdown files containing your goals and plans as you advance your code and your tests. Treating your spec as a frozen artifact written before work begins, you'll fail to capture learnings during implementation. Keeping it current lets us constantly inform you and your agents choices and makes frequent rebuilds easier.

Shimin (34:38)
Yeah, I really like this one. I've not been doing a good job of keeping my specs in sync, I think doing some of my side projects. I just let it kind of get bloated out, which is not great. But these specs are also sometimes hundreds, sometimes thousands of lines long. it's actually a significant chunk of work.

to keep that in sync.

Rahul Yadav (34:58)
The

even in the world before spec driven and development, the specs were rarely in sync, right? You would initially build a feature that would already by the time you had restarted with the PRD to by the time you build it already was out of sync with the initial spec. But as you made changes to it, sure, there's changes captured in tickets here and there all over the place over the course of

Dan (35:05)
In sync, right?

Rahul Yadav (35:24)
many months or years, but it was never, you could never go to one place and be like, here's literally everything related to this feature. So I almost, I've been wondering like, is spec driven development the best way to go about it? Maybe you need, I don't know, to continuously update it. But again, like it didn't work as well before either.

Shimin (35:47)
Yeah. So the other option is to kind of let the agent handle it and be a memory bank. So we just previously spoke about, now you have the problem of you have your specs, you have your memory, which frequently gets out of date and you have the code.

Rahul Yadav (35:53)
Hahaha

Yeah.

True.

Dan (36:00)
Yeah,

it's an interesting problem to solve. OK, so next up we have number six, which is find the hard stuff. Work on a project long enough and things will stop being easy. You'll speed through the boilerplate work, the obvious design decisions, and start hitting the ugly, difficult work. Intuitive design performance, security, resilience, and systemic architecture. Anyone can vibe the easy stuff. The hard work is where the value is. Find it and dig in.

Rahul Yadav (36:24)
there is a book that is it's called Risk Up Front by Adam Josephs and Brad Rubenstein. It's a very good book about this. It's in the name, do the riskiest stuff first. That's where most of the projects go wrong. And if you do, you know, solve for the riskiest stuff, everything else will fall in place. That's what this reminded me of.

Shimin (36:24)
I love this one.

Mm-hmm.

Rahul

are you reading again?

Rahul Yadav (36:49)
This is from almost a decade ago, so no. I plead not guilty.

Shimin (36:52)
Okay, good. Just

Dan (36:54)
He's just accessing his

Shimin (36:54)
double checking.

Dan (36:55)
memory bank, which overruled the specs in this case.

⁓ Yeah, so speaking of memory banks, number seven, automate everything that's easy. Wait, ⁓ maybe that's not memory banks. To spend more time on the hard stuff, minimize the time you spend on easy things. Distill learnings into skills, build loops, automate code reviews, and let your tools compound. But careful, don't get stuck in a mystery house. There's a link. You should read it in the show notes, but we're not going to cover it right now. Number eight, develop your taste.

Shimin (37:06)
You

Dan (37:24)
which I'm hearing that word more more frequently in the past couple of weeks too. When code arrives fast, but feedback doesn't, the only source of feedback that keeps up is your own. The better you know your domain, your users and their problems, the further you can go without checking in.

⁓ number nine, and agents amplify experience. So talented developers underestimate how much intuition they bring to their prompts, the right terms, the right framing, the right level of specificity. If you know your stack, you can save countless cycles during both implementation and debugging and cut down needless agent exploration pair of technical expertise coupled with great taste for an unbeatable advantage. Just like Google agent, whatever. Sorry.

You

Rahul Yadav (38:07)
Google Enterprise Agent Platform.

Dan (38:11)
Yeah,

the unbeatable advantage. Just kidding.

Rahul Yadav (38:14)
Yeah. Use our referral

code that we don't have to get no money off.

Dan (38:19)
God.

Now we've gotten multiple emails about this running gag, so I'm sorry. ⁓

Rahul Yadav (38:23)
you

Shimin (38:26)
Yeah.

By definition agents or AI produces the median, the most likely code, right? So your own expertise is almost like a unique distribution over it. We'll find out. I wonder, I wonder how unique my taste is.

Dan (38:42)
It's very special, just like you.

Shimin (38:44)
No.

I prefer diet Dr. Pepper.

Dan (38:46)
And last but not least, one I've frequently said, again, really relate to this, code is cheap, but maintenance, support, and security aren't. This is interesting. Agent code is quote unquote free as in puppies, okay, rather than beer, but support isn't cheap and neither is security. Build fast, but mind the maintenance you're adopting.

Shimin (39:03)
Very true.

Right, free as in puppies in terms of like the puppy is free, but you have to feed it and take it on walks and runs. It's actually a really good, a really good analogy. And it does kind of conflict with number seven, right? You can automate everything that is easy, but now you have another automation to maintain. And I actually find that in my usage of a pi agent.

Dan (39:15)
Yeah, take care of it. Yeah, yeah, that's fair.

Mm-hmm.

Shimin (39:34)
is now I have to maintain all these automation. Like creating them was really easy, but now what happens when they kind of go drifty, you know, it's definitely not free, but maybe that's where software development is going.

Dan (39:41)
Mm-hmm.

Yeah, or using Google, whatever platform to create a log long running agent so that you can have it handle the like the thing I think we have jokes aside, the thing I haven't think we haven't really scratched the surface of yet. And I know some companies are starting to do this is like basically having an agent like agent harness or something like that. That's like looking for those problems actively versus like taking a ticket and then, you know,

Shimin (39:49)
Right.

Dan (40:11)
Okay, well, we know we have this debt or please update these dependencies or something like that. So, like, versus having like the backlog guy, you know, not really a backlog, like tech debt. Yeah, guy or gal or, you know, thing. Yeah, that's just like watching all the time and coming up with stuff and then trying to make it better.

Shimin (40:21)
Yeah.

Or gal.

Right. That's a open AI's harness had that, ⁓ AI garbage collection, AI slop garbage collection agent that just does that. And I have been thinking about that this week too. It's like, how frequently should I run? Well, one, I need to create an agent that does it and be how frequently should I be running it in these vibe coding projects.

Dan (40:54)
Yeah, and see will it use fresh eyes? Or will it just create more slop via over editing, you know?

Shimin (40:56)
Always.

Yeah, great list.

Dan (41:01)
So

yeah, that was the 10. Yeah, there's quite a few that I related to. I know it's a little perhaps a little cliche to just run through it, a top 10, but I thought it was a pretty excellent top 10. So there was definitely several that stood out to me.

Shimin (41:15)
Yeah. And Drew mentioned, and I think we agree, a lot of developers are converging on these similar patterns. So it's as you can hear from our discussion, like we are all thinking about a lot of these lessons ourselves. So eventually they will be carved on stone tablets and get passed out. But for now, yeah, it's good to see.

Dan (41:33)
Enforced

by a scrum master.

Shimin (41:38)
⁓ yeah, that's a... On that note, if we still need Scrum Masters... ⁓

Dan (41:43)
coming soon to an agent near you.

it's,

don't know. Sorry. I should explain my thought process better. mean, like it will be whatever the agentic version of a scrum master is that, you know.

Shimin (41:57)
⁓ yeah.

That makes sense.

Dan (41:59)
Cummingsoom

certified agentic stone tablet carrier. I don't know. Anyway.

Rahul Yadav (42:03)
running on

Google Enterprise agent platform obviously.

Dan (42:07)
Yes.

Excellent.

Shimin (42:09)
All right, Let's talk about a history of attempting to eliminate agile coaches slash programmers.

Dan (42:15)
Which we couldn't have picked a better person to really run us through this article because Rahul has a documented history of trying to eliminate people's starting with technical writers. So what do you have to say about eliminating?

Shimin (42:18)
Ha ha ha!

Rahul Yadav (42:28)
My list of offenses,

yeah. Once long. ⁓

Dan (42:32)
So what do have to tell us about

eliminating programmers or...

Rahul Yadav (42:35)
So there's going to be a bit of a letdown from my usual stance because apparently we cannot eliminate programmers. ⁓ This article is The Eternal Promise, A History of Attempts to Eliminate Programmers by Ivan Tarkovic. This was a good, you know, it was a good like putting things in perspective article.

Dan (42:43)
⁓ Wait, what?

Rahul Yadav (43:00)
⁓ I once started from the 1950s to present day and kind of walked through the different waves that we've had ⁓ of the attempts to replace programmers. So from the very beginning, he starts with, you know, back in the 50s, or even, you know, initially when the computers were invented, people literally wrote assembly language and machine code.

businesses needed software, but the people who actually understood software was very limited. And people who understood business well didn't really understand software because it was such a specialized field. And so that's why software programming was a separate job in the first place.

Comes along COBOL, which ⁓ was the first attempt to be like, we can replace the, you know, hardcore programmers, people who actually write machine code and everything with, you eliminate this bottleneck ⁓ of specialized programmers. The promise was that the business analysts are going to write their own programs because they know the specific problems that need to be solved. And then we won't need

⁓ specialized programmers to be doing these things and we can take away their jobs. So that was the first attempt for that. ⁓ It's in the name the common business oriented language is what ⁓ COBAL is the acronym for. And then even if you guys remember less than five years ago, five years ago during COVID there was this

Dan (44:17)
I actually didn't know that's what it said. That's cool.

Wait, do you mean in AI time

or in real time?

Rahul Yadav (44:30)
in real time, but COVID time, so it could be anywhere between, you know, yesterday and a few decades ago. I think 2020 is 2021. There was this ⁓ whole big, you know, it was in the news about how we are running short of COBOL programmers and we need them for our flights and like business banking software, and all these things, and all these like trillions of dollars.

Dan (44:34)
yeah, that's the... okay.

Yeah, like people were coming out of retirement to,

yeah.

Rahul Yadav (44:58)
Yeah, because it

pays great money, but only very few people know it and it's a dying breed and all that. But a lot of our critical infrastructure still runs on it today. This was before GPT-3 came out and, you know, everybody then got onto LLMs and everything. So I don't know how much things have changed since then. I don't know what if you were to ask Chad GPT or any of the other clod or any other

agents for, you know, cobalt suggestions, what you would get out of them. But anyways, the what I would call out is like even today, we have this and these are high paying jobs.

So that was the first attempt, which was followed by the first AI winter where Herbert Simon, Marvin Minsky, some of the other people back in the mid to late 60s were predicting that AI is going to be solved within a generation or so. So call it like, you know, 20, 25 years. It got a lot of funding, especially from DARPA back then even at that

After that, lot of jobs would be revolutionized. Machines would be capable of doing humans jobs and everything. That ran from mid-60s to mid-70s or so. And then, you know, after all that funding, didn't really go anywhere. Funding collapsed. We had our first AI venture during the mid-70s. But we did move up. If you look from that time when people were writing machine code to COBOL,

was one layer of abstraction and so we did move up one layer. Then you get all these other languages, the fourth generation languages that even abstracted things even more.

the marketing material was similar even for them. So we're now in the 1980s where people are talking about, you know, similar to the no code platforms that we have today where people can build their own applications. You're going to have, you know, everybody writing code and building their own applications instead of relying on a handful of engineers who are always the bottleneck for you. So that was the selling point even four decades ago at this point. But again,

didn't pan out either ⁓ because as you build complex applications you need complex thinking and then as things become more intricate you have to have people who can solve those intricate problems. You can only solve so many problems if you're actually not a specialist in that.

So again, that created more programming jobs ⁓ and it needed more specialized people as well. Then you're into late 80s and 90s where you get case computer aided software engineering. Lots of money went into that ⁓ conferences, all those things. Consultants were selling it big time like with today's prompt packs and all these other things.

Some of the things from that survive even today where you see like object oriented analysis and stuff like that, but most of that didn't really pan out.

Then you get to the second AI wave where this was interesting. Japanese fifth generation computer project launched in 1982 with massive government investment aimed to create computers that could reason using logic programming. The explicit goal was to leapfrog Western technology and create machines capable of natural language understanding, automatic programming and artificial intelligence. 1982, so 44 years ago.

Shimin (48:18)
I wonder if this is

pro, this is like where Prolog was created. Cause I really like Prolog ⁓ yeah.

Rahul Yadav (48:24)
Could be, yeah.

But, you know, didn't pan out, obviously.

the fifth generation project didn't achieve its goals because in a very narrow frame maybe it did something but as soon as ⁓ it ran into a situation that it hadn't really seen before, it would fail and then you could only, the problem with reality is it's got all the detail and you can't really put that in specific rules and everything as well.

That didn't work out. Then you get HTML and World Wide Web in the 90s, which was, again, if you look from the initial machine language, VR moving more and more abstract where people can build their own websites and do all of these things. But with that, you also got professional web development and you got single page applications, mobile first design, progressive web apps, JavaScript, and all these things came out of that.

which

led to more engineers, not less.

Then you get to early 2000s, you have model-driven architecture, you have UML that becomes the standard notation around that time. But that didn't really stick around either, it was gone within a decade or so. And then you go from there to no code, low code, which became a thing in the mid 2010s, where you had a whole bunch of companies that are like, you don't need to write code, you can just drag and drop and build.

your own things using no code, low code. But again, as you get to anything that's more domain specific, no code, low code cannot actually solve those problems. So anything past your simple applications and everything.

you then again have to rely on ⁓ specialists and because you're solving complex problems and once you need that you need engineers once again and then as you not only in building house but as you're dealing with larger scale let's say you started a low code no code I don't know Shopify app or something but you then have to make it specific to yourself and you have to scale it you end up building a

you know, in-house engineering team to be able to tackle all those things. And then finally that brings us to the current wave, which is a large language model. So now the abstraction is you don't even need any specific programming language knowledge as much as you just need to be able to know the right words and to be able to craft your intent behind it. And you know, to Ivan's credit, he does call out like,

mean that this current wave is identical to the previous waves. A bunch of these things were genuine capability breakthroughs. And right now, LLMs can do a lot of tasks that any of the previous technologies could not. You couldn't just tell JavaScript, can you do this in plain language? And it would do everything as you expected. So what they do is useful. But again, the fundamental challenge still remains.

which is ⁓ you take something that is human intent and you put it in the right form, make sure that it's architected correctly. We talked in the last article where you also have to make sure that you're focusing on security aspects and things like that, which you're not gonna be able to get from someone who just doesn't have that experience, right? Because they won't know. We've talked about people inlining their whole scheduling.

app. that's the kind of stuff you're going to get out of that. know, if you and then Drew has some reflections of this where the one of the paragraphs that really stood out to me was software is not just code. It's a precise specification of behavior and under all possible conditions. So it's not just like initially, whatever you spin up

You just run with it and you're good to go. But even when you are building a simple e-commerce application You're specifying a thousands of decisions that are getting made under that. How would users authenticate? How would you process payment? How would you manage inventory? How would you perform under load? How would you? Secure and against attacks and all of those things You cannot really do that Just by you know prompting elements through that especially if you're not an expert

in the domain that you're building in. And then you have to do a lot of trade-offs, you cannot really do if you're not an expert. Performance versus maintainability, security versus usability, flexibility versus simplicity. So you have to keep all these things in mind.

think the overarching thing that was on my mind from all of this was there's this like Ricardo's I don't know if it's called the law but like Ricardo's law of comparative advantage which is if I'm good at doing something if I can specialize in something and the other person can specialize in something else it is much better off for both of us to do that and then trade in each other's specialty versus me trying to generalize in every single

Shimin (53:03)
Mm-hmm.

Rahul Yadav (53:18)
thing and so would they so that both of us can try and do everything ourselves because the world would be worse off if everybody was a generalist and no one was a specialist and so even with the current wave we're gonna need specialists which means we will need engineers they they're definitely not gonna do the things

The same way we've done so far, none of us writes machine code or anything, but they're gonna be solving similar problems and they're gonna still need to specialize in their job so that they can actually do something that generalists cannot do.

Dan (53:50)
Speak for yourself, I wrote Tetris in x86 assembly one time.

Shimin (53:55)
⁓ I, I agree with a lot of this. On the other hand, I wonder if, you know, quote this time truly is different unquote. Like if the jagged edge is no longer an edge, but it's just overlaps humans, then we're back to the world of, humans are merely cheaper than AI. And then that's only cheaper at a given point in time. Like if what.

If the AI can actually make those thousands of decisions better than a human can, or at least better than a human can in the 95 to 99 percentage case, you know, that changes everything.

Rahul Yadav (54:29)
Yeah, the that I think I've harped on this a couple of times since we talked about this. That whole system 123 thing that we talked about. Karpahy was on Sequoia had like an AI agent submitter, AI submit whatever it's called. And one of the lines he coded someone else he couldn't remember who it was. But the

Shimin (54:46)
Mm-hmm. Yep.

Rahul Yadav (54:55)
quote he had was, you can outsource your thinking, but you can't outsource your understanding. And unless and until, so if you look at this whole article and all these things that we've been talking about from that lens, the value we provide is from our understanding of these things, right? So.

If we keep delegating things, not just our like help me think of different ways, but also the AI can understand the system better than me over time, then definitely I think what you're saying becomes much more likely. But as long as we understand the world better because we have, know, our senses and our judgment is built over, you know, interacting with the real world, with the experience and everything, unless you can actually build that

into AI and you can have that whole long-running agents and everything it's still going to be a problem because it won't be able to understand the fundamental impacts of these things.

Dan (55:50)
I was gonna say while you're taking us through the history, Rahul, I was curious. So I asked Claude to write me a small COBOL program to convince Rahul that LLMs can in fact write COBOL. So I will speed through it, because I don't know COBOL. I have no idea if this is valid or not, but environment division in all caps, data division, working storage section.

Rahul Yadav (56:02)
Nice.

Dan (56:14)
01 WS-Skeptic-name pick X 20 value Rahul. So it goes on and it writes legacy facts, fact one, 95 % of ATM transactions touch Cobol code. Anyway, it's pretty interesting language. I've actually never really looked at COBOL but it's implying that yes, it can write COBOL That was Opus four six.

Rahul Yadav (56:17)
I skipped it. ⁓

Yeah.

Shimin (56:37)
my favorite analogy of software is that it is calcified business process and somebody has to calcify it and somebody has to define it first. And if that business process can touch electrons, you know, whether it's the electrons in the form of data in a database or electrons in the form of

Rahul Yadav (56:42)
Hmm.

Shimin (56:57)
controls for a motor, then somebody has to be in charge of the programming of the electrons. And that person is probably going to be called a programmer.

Rahul Yadav (57:05)
Mm-hmm.

Hmm?

Shimin (57:10)
no matter what the definition is concretely. So I think programmers will probably survive. Somebody has to orchestrate the electron dance.

Dan (57:19)
The other thing that's interesting that he didn't really cover in this, but I was thinking about it as you're going through it is like, what's actually happening at each of these levels, right? So 1950s, it's, you know, like, or even beyond, was either dip switches or punch cards, right? They were actually like an X, you know, an assembly command, like a word, right? then

Rahul Yadav (57:32)
Mm-hmm.

Dan (57:38)
We go to like compiled languages and blah, blah, blah. I you already went through it, so I'm not going to run around through that again, but at each step, because it gets a little bit more accessible, the, scope of the, not even the scope. It's like the cost benefit ratio of the thing that you're applying software to goes down because it got cheaper to do it.

Rahul Yadav (57:59)
Mm-hmm.

Dan (58:00)
at each step, right? So if you're machine programming, you're like one specialist, that guy probably wasn't cheap. Maybe they were, but like at a certain point they probably weren't right. And then cobalt comes along, cobalt frameworks kind of were cheap, probably relative to that guy. And then, you know, boom, then we get to the next set of languages. It got even cheaper and more accessible to the point where like people could just learn it for fun, you know, and they start doing it. Now, LLMs is like anybody can solve, maybe not anybody, but like

the bar has been lowered to the type of problems that we can apply software to. So I think that's an interesting lens to look at it through because we might also see that like, as that happens, the proliferation of the problem space means that it's not gonna necessarily kill jobs. It'll just mean that like truly software eats the world.

Shimin (58:46)
Mm-hmm.

Rahul Yadav (58:47)
Hmm

Dan (58:47)
in a way that it's sort of eaten business in the last, you 10 to 20 years. Like now, what if it eats everything else? Because it's cheap enough to do so.

Rahul Yadav (58:57)
There's also only so much, you know, if you put this at two extremes, one is, you know, let's say before the LLM revolution, right before, let's say, you know, tomorrow is November, whatever, and GPT-3 is coming out, today is October 31st, you're a fully human organization, you don't use LLM, and then...

Let's say you give it a decade or two and then you have a fully AI agents run company with literally there is no human making decision. If you look at it from those two extremes, right now we're somewhere in the middle. If you keep removing humans at some point, first of all, you're going to run into regulatory issues or you're really good at pulling off a

scam where you go there was actually no human involved. I'm just I live in you know outside of US jurisdiction serving this thing but if it goes down sorry not my problem because you're going to run into trust issues if things go down and so can you actually build anything at a large scale if you get to that point and if not how far can you push it where a generalist can hold

Dan (59:45)
Hehehe.

Rahul Yadav (1:00:07)
specialist knowledge in their head and those two are like mutually exclusive because then at that point they became specialists in different things like you know if we go with a simple e-commerce site example or something can they actually know enough about you know maintaining load and scaling the system and making these trade-offs where they are asking the LLMs to do something or the LLMs are doing something but they actually have

And how long does that system run until you, it has uptime issues and everything, customers lose trust, and then it actually doesn't work out. So somewhere in there, you need a human. And when you need a human, then it's a problem of that human only has, you know, a good eight, 10, 12 hours. Maybe they're doing performance enhancing drugs, mental performance enhancing drugs, and they have, you know, 20 good hours ⁓ every other day and they really kill it.

Shimin (1:00:54)
you

Rahul Yadav (1:00:59)
But can you the time I guess that you can spend on it as a human is still going to be the bottleneck regardless of how far you push it. And if that's the case then you'd rather hire a specialist to be like well you own engineering and you have all these different agents under you. own marketing you own sales sales whatever. But you need that person and then that person could only hold so much contact. So and then we're back to.

somewhere in the middle redefining the organization like we have today.

Shimin (1:01:31)
Yeah, I you're saying. So if

developers wants to keep their jobs, we need to take mental mentally stimulating drugs and stay awake for 20 hours a day is what I'm hearing. I'm here for it, man. I don't hate that future. Dan likes it too, because it's cyberpunk.

Rahul Yadav (1:01:42)
No. ⁓ If you want a long career, get

Dan (1:01:45)
You

Rahul Yadav (1:01:47)
good night's sleep.

Shimin (1:01:48)
But Dan, I'm glad you brought up software will eat the world because my article for post-processing this week titled, people do not yearn for automation is directly related to that. This is from The Verge and it's been waking the waves on the internet this week or at least on my corner of the internet.

Rahul Yadav (1:02:14)
Did you say waking the waves? Making the waves. I heard waking the waves, but making the waves.

Shimin (1:02:17)
Making, making the waves. Did I say waking the waves?

Dan (1:02:22)
haha

Shimin (1:02:22)
making the waves. ⁓

The basic thesis is, know, Nilay Patel's attempt at explaining why we saw such a large gap last week from the Stanford 2026 AI index report between the expert opinion on AI and how the average American feel about AI.

that 50 point gap between the experts and the regular folks. So a recent Gallup poll found that only 18 % of Gen Z was hopeful about AI. And that was down from already bad about 27 % last year. And anger is in fact growing. 31 % of those Gen Z respondents said they were feeling angry about AI from 22 % last year. So this directional trend is not good. And he blames it on

This idea of a software brain. What is software brain? Well, software brain is what we just exhibited when we all agreed on software would eat the world. It's this idea that the world should be viewed in a paradigm of it is merely data and algorithms. And by controlling the data and by controlling the algorithms, you can control the world or Silicon Valley in the last

you know, 10, 15 years. the next thing that he brought up, uh, was that,

there's an analogy between how software people think about data and how lawyers think about law, right? Like we both are trying to have this formulate a close solution to our problem space where as long as you can define the constraints, you would achieve a goal. If you can just define the perfect set of constitutional laws, then you'll have the perfect society.

Then he makes the argument that that's simply not true because ambiguity is where the beauty of law exists. And similarly, folks resent the fact that Silicon Valley is trying to convert their lives into a set of data, right? Like human life cannot be completely mapped via database. Our friendships are not the number of DMs we've sent each other over the last

for a month and how many times we've actually met up. It is something that is not necessarily quantifiable. And the fact that AI folks seem to think that it is indeed quantifiable means we've been corrupted by the software brain. What do you guys think?

Dan (1:04:40)
Yeah,

yeah.

Shimin (1:04:42)
Hahaha!

Dan (1:04:43)
I have a different take on it, which, you know, maybe I'm corrupted by the software brain, but what I think is like, really honestly, it's my previous point again, which is, okay. So look at the incentives of something like a social network, right? It's not to connect people. It's ultimately to build advertising profiles about those people and then sell them cheap blouses, right? Or whatever. So.

Shimin (1:04:59)
Mm-hmm.

I

like a cheap blouse

Dan (1:05:08)
It may not have started that way, but that's definitely where it has ended. Right. And really like you're seeing things like that with even sectors that it kind of doesn't make sense in like, like, you know, I Cory doctor out of my old buddy there, coined the term and shitification and we're way off on from AI, but Hey, whatever. so like, you know, look at like dating apps, right. The inshittification of dating apps is like the number of people actually like finding a life partner via.

A dating app is going down because the incentives are mismatched. The dating app wants you to spend money on the dating app, not help you find a partner. Right. And there's actually even been platforms that are like, Oh, we're going to change that because that's like, you know, enough of a thing, but that they bring you back to AI. thing that I think is what could be actually truly revolutionary about this is that now, okay. So let's say I'm on, I don't know.

Shimin (1:05:49)
Mm-hmm.

Dan (1:06:04)
something that rhymes with Facebook, right? And let's say Rahul is and you are too, Shimin and we want to hang out. And we're like, wow, screw this thing. It's like terrible. And it's building ad profiles of us. We've got a, you know, well, Shimin has a powerful Claude subscription. We could just build our own social network and run it on a Raspberry Pi. You can do that right now, right? ⁓ you're right.

Rahul Yadav (1:06:10)
No way.

I think he's on Gemini Enterprise agent platform now, Dan.

Shimin (1:06:30)
What is Gemini paying you?

Rahul Yadav (1:06:32)
You

Dan (1:06:32)
I know. It must be good and the rest of us aren't getting it. Rahul is taking over the podcast for Google.

Shimin (1:06:36)
I want a piece of this, yeah.

Rahul Yadav (1:06:42)
A prediction market has me saying this seven times. I think I'm on five or I need to sneak it in one more time.

Shimin (1:06:46)
Hahaha

Dan (1:06:51)
You know, he's just finally fixing our Claude bias on this show. ⁓

Rahul Yadav (1:06:55)
Hehehehehe

Shimin (1:06:56)
Single handedly

Dan (1:06:57)
Yeah, now it's all about Google.

So yeah, I don't know. That's my hot take on it is that I think it's more about mismatched incentives than.

Rahul Yadav (1:07:01)
Can I read something?

Shimin (1:07:05)
Yeah, as long as it's

Rahul Yadav (1:07:05)
Dance.

Shimin (1:07:06)
not about the Gemini Agent development kit. Go ahead.

Dan (1:07:09)
You

Rahul Yadav (1:07:10)
Not yet. ⁓ I think I need to take a five minute gap or something. The prediction market rules are specific. I can't be violating them. Since Dan called it ad platform, I read this tiny post on Substack where Anuradha Pandey said, if we stop calling it social media and instead said ad platforms, many ridiculous aspects surface.

One, we construct whole identities on ad platforms. Two, we get the news from ad platforms. Three, we see ad platforms as a medium to demand positive social change. Four, we excuse our usage of ad platforms so we can keep up with our friends. Five, we let ad platforms degrade our attention to the extent that we insist podcasts and hearing a book are equivalent in cognitive effect to reading an actual book.

Precision when discussing social problems can surface a lot of work that euphemisms do. I thought it was great, like, you know, people say framing something matters a lot. This was an excellent post in that once you start thinking of them as ad platforms, a lot of things just are ridiculous in how we use it.

Dan (1:07:59)
Ha ha ha.

or make more sense

depending on the lens. Yep.

Shimin (1:08:23)
Right. The

Rahul Yadav (1:08:23)
Yeah.

Shimin (1:08:23)
I'm with you Dan. I, you know, one of the other founding findings from the Stanford report was that America is unique in its negative general outlook on AI. Whereas the rest of the world feels much more positively about AI's impact, especially developing world.

Dan (1:08:38)
Yeah. And,

and why might that be right? Well, because there's also been kind of a turn against it. And this is the thing that really frustrates me as a technologist, right? Because like, I'm a nerd. I mean, look at the array of tiny computers behind me, right? Like I love this stuff, but

Shimin (1:08:42)
Right? Yeah.

Dan (1:08:55)
What I don't love is being associated with like quote unquote big tech, right? And one of things that I've talked about with that and not on this podcast, but like generally is that like big tech is really just big business. This is like sort of the eighties, like sort of cutthroat style stuff has been applied to, know, technology sector. And that's what Corey is calling it shitification.

Shimin (1:09:00)
Mm-hmm.

Rahul Yadav (1:09:07)
Hmm?

Shimin (1:09:17)
But why? So I think...

way people are finding themselves addicted to social media is kind of similar to the way that folks will find themselves addicted to AI. ⁓ Speaking of which, Dan, you brought us a Ars Technica article.

Dan (1:09:28)
Mm-hmm.

Yeah, so there's been a sort of cluster of mainstream, well not mainstream, but tech news that have been focusing on this paper that was published in Nature recently where...

AI models.

that are sort of like soften difficult truths, right? Which I think we see a lot. mean, to the point where you've had your tools, Shimin where you wanna like identify sycophantic behavior. ⁓

Shimin (1:09:57)
Mm-hmm.

Dan (1:09:58)
It is like being popularized by all these model providers because they effectively proves that people like the models better when they behave that way. But the output of this paper is that in training the models to behave that way, it's actually significantly reducing in some cases the output accuracy of the model itself.

So they're calling, they call it in the paper like a quote unquote warm model, meaning, you know, it has a warm fuzzy personality, but yeah. So yeah, as we can see from the abstract, it was like 10 to 30 percentage points in terms of error rates, which is pretty insane when you think about it. Like that's a significant jump.

Shimin (1:10:37)
Did you see?

You see this related article that came out actually a month earlier where it's kind of in the same genre, but basically the idea is human trusts warm models more even when their output is less good. So when you combine the two, it really paints this bleak picture of

Dan (1:10:45)
I did not.

Shimin (1:10:59)
Frontier Labs having an incentive to just train the warmest model they can. It doesn't have to be good because the ones with the warmest model will gain the most trust and therefore the most ⁓ usage.

Dan (1:11:10)
Yeah, that does paint a pretty bleak picture for a while.

Shimin (1:11:13)
Yeah, and this week it also came out, you know, do you guys know Richard Dawkins He wrote the Selfish Gene and the God Delusion. He's like a famous atheist. He coined the term meme as a spin on gene. He came out this week to be very pro AI. I think he said something on the lines of he spoke, he spent three days talking to his code agent, Claudia.

Rahul Yadav (1:11:19)
Mm-hmm.

Dan (1:11:21)
Mm-hmm.

Shimin (1:11:38)
and came out with the conclusion that AI must be conscious in some form. Now, whether or not I respect Richard Dawkins a lot, I'd love his books as a child. But, you know, this does kind of shed some light on the fact that even some of the brightest minds of our generation ⁓ can be seduced by the power of a warm AI. I think he specifically called out Claudia for

complimenting him with original insights.

Rahul Yadav (1:12:06)
Hahaha

Dan (1:12:06)
Yeah,

I mean, my first thing is like, wow, you've named it. Right? So that immediately says something to me. But I will freely admit that one of my coworkers caught me calling Claude he this week and made fun of me for it. you know, it's just shows that none of us are immune to this really, even people that.

Shimin (1:12:10)
Yep. Yep.

Rahul Yadav (1:12:20)
yeah.

Shimin (1:12:26)
No, I'm definitely not immune to it. I sometimes ask Claude Code for an opinion and I can sense a dopamine in anticipation of a compliment. And that's really dangerous. It's like...

Rahul Yadav (1:12:39)
Interesting.

Dan (1:12:40)
That's a great

idea, Shemin. Keep asking.

Shimin (1:12:42)
Exactly. Even if it's cheap,

even if you know it's not true, we are wired for agreement, and agreeability and warmth. So it's quite dangerous.

Dan (1:12:50)
Perfect.

Shimin (1:12:51)
Good to know. Thank you, Dan. That's all we're friends.

Rahul Yadav (1:12:51)
which is why these would make for

great tools in authoritarian's hands.

Shimin (1:12:57)
Yeah, unfortunately. All right, well, that's kind of a bummer note. So let's go on to my favorite segment. Dan's Rant Dan, what are you ranting about this week?

Rahul Yadav (1:13:01)
Hahaha

Dan (1:13:06)
Yeah, so for today, this is like kind of a cross between VibeIntel and Dan's Rants, I guess. So I've been getting really unsurprisingly into home automation and I've been using Claude here and there to help out with some pieces of it. And the problem is I was trying really hard.

not to use Claude. So I'm, I've been working on, ⁓ so like I'm using home assistant, which is pretty nice. Check it out. If you're into home automation, it's a free open source project. really cool. And it has like plugins for like everything. It's wild how much like stuff people have contributed, but the, the thing that bugs me a little bit is all the automations are like these crazy, like YAML templates. And they have like a sort of nice GUI editor, but it just

makes me feel like I'm like configuring a Kubernetes cluster or something. like, it's not a great, it doesn't delight me, let's put it that way. So I'm like, you know what I do enjoy is writing TypeScript. And it would be a really good fit for this because you've got all these crazy like entities, right? That are like things in your house, like lights and switches and whatever. And it's neat to have those all strongly typed so you can like verify that you're not trying to like pass an air conditioner into a light or something in your automation.

So I was trying really, really hard to avoid using Claude in general for this piece of it, and I just keep finding myself going to it. like you're saying, the addiction piece is really real. I'm just like, well, I could write this, or I could just prompt it and have a pretty decent solution. So I guess one example for that is I'm finding myself repeating a pattern of motion sensor, light sensor, turn on a light.

Shimin (1:14:23)
Mm-hmm.

Mm-hmm.

Dan (1:14:40)
And then

also have an override in case the system goes bonkers. Right. So this was like a pretty standard like template that I'm applying and I like wrote it once and I'm like, you know, I should just like pull this out and abstract it. I'm like, and Claude, and it wrote like pretty decent abstraction. And I'm just like, why didn't I just take the time to do that myself? And it didn't do quite exactly what I wanted, but I just like let it slide because it was good enough, you know?

Shimin (1:14:43)
Right.

Right.

Dan (1:15:05)
So I just, I'm struggling cause it's like on the one hand, that's kind of the feels like it's the way of things. But on the other hand, it's like, I wanted to do the point of a personal project to me is like, stay sharp. Like don't let those skills atrophy. And, instead I've gone all the way in on Claude, guess.

Shimin (1:15:22)
Yeah, I find myself with the same issue. Speaking of social media, we're running into that social media addiction loop, but we're doing it much faster. Like it's three months for what took like a decade to get us to by us. mean, the general public to get addicted to Facebook. Like it happened in three, four months. It's just how powerful this tool is. And there, there are going to be ways like phone locks.

Dan (1:15:30)
Mm-hmm.

Shimin (1:15:44)
for Instagram, but Claude code locks for software developers. Maybe we will find ourselves doing something like co-pilot only Thursdays or something like that. I think we're gonna solve this problem, but it's a real one.

Dan (1:15:57)
But you know what it feels like though, like just emotionally is it feels like you're typed on like, I don't know, a different key layout than you're used to, like Dvorak or something like that. Or another one for me is like typing on mobile, right? Versus like typing on a fully fledged keyboard. Like it felt like that when I wasn't using it. I was like, God, they've fundamentally broken like my arm in a way that I can't do this task as quickly as I wanted to.

Shimin (1:16:06)
Mm-hmm.

Dan (1:16:25)
That scares me a lot that it felt that way.

Shimin (1:16:28)
that's... yeah that's not bueno. I don't know what the...

solution would be but we should maybe we should start a Luddite movement where we

Rahul Yadav (1:16:37)
you

Dan (1:16:38)
Well, I'm just thinking like what's the next flight that I'm on that has crappy Wi-Fi right and I try to get some work done and like I can't because It's not enough bandwidth to like run an LLM. I guess I'll have to use a local one, you know, but it's not as good Where does that end? Yeah, so I don't know interesting

Shimin (1:16:46)
Right.

Yeah. Yeah.

Rahul Yadav (1:16:56)
Or how much power does a big model company have over other companies where if they double or triple the price overnight, your workforce is dead in the water because they're like, if we can't afford this, I don't know how to do my job.

Dan (1:17:09)
for a little bit, yeah, until they...

Shimin (1:17:13)
Yeah, local models. We need to have them at least local models Friday where you only use a less powerful model. I think we can catch up.

Rahul Yadav (1:17:21)
GPUs too.

Dan (1:17:21)
You still have to tell me how you got

tool calling working on PyAgent in your... Did you switch back, by the way, too? I was also wondering about that now that, like, or Anthropic is reversed stance.

Shimin (1:17:27)
Yeah, it wasn't complicated. I'll show you.

No,

I haven't. I've just been dealing with the suboptimal scale output for a while, but I still use, you know, Claude Code for the side projects, right? Like when you need the most powerful. Personal? ⁓ Yeah, no, I haven't. I've tried the pattern of using, so my account wasn't unlocked, so I ended up having to use extra usage for Claude Code.

Dan (1:17:44)
Yeah, but I mean for your pie researcher guy, yeah.

Mm-hmm.

Shimin (1:17:58)
tried it for

a little bit and was like, it's probably not worth it. So I'm back to using local model for everything. The output is just not great. Sometimes I get duplicate emails, which is not great, but you know, it's the price.

Dan (1:18:10)
Did I said this yet? I don't know. Did I rant about this

yet? Interesting. That's true.

Shimin (1:18:13)
price you pay for freedom. ⁓

Rahul Yadav (1:18:17)
So for a long time, the way I rationalized the work we do and how it's worth investing in a lot of learning and knowledge work and everything is an athlete's career only lasts so long.

versus like a knowledge worker's career could last two or three times as long because you don't have as many.

old athletes, but you have a lot of, you know, people who are way past their whatever you would consider their retirement age and are still going strong because our brains are usually the last thing to go, but our bodies given much more sooner than that. And now with this de-skilling, I feel like that's the, if we give into that, then I don't know what it looks like. Cause

There's no way our bodies are going to get better with time, but our brain, you can usually keep that from degrading for a long enough time. But what Dan just described is like that gradual disempowerment mentally, where we're just continuously, since it's cognitively easier to just have something else do the work. Once we go down that path, we have that probably the last remaining thing that also goes away. And then you can spend it on hobbies.

but I don't know, like without cognitively challenging things life is not as fun.

Dan (1:19:38)
That's true.

Shimin (1:19:38)
Yeah,

it's another bleak one. So we better do something to stay sharp.

Rahul Yadav (1:19:41)
Bye.

Dan (1:19:43)
You

Rahul Yadav (1:19:44)
Read them books.

Shimin (1:19:45)
Yes, read them bugs.

Rahul Yadav (1:19:46)
Yeah

Shimin (1:19:47)
All right, now we are on to our last segment, Two Minutes to Midnight, ⁓ inspired by the bulletin of atomic scientists atomic clock We're talking about where we are on the AI bubble clock. So I have the very first topic this week from Where's Your Ed at, essentially mentioning that

OpenAI has projected that their $20 a month chat GPT plus subscription is projected to decrease from 44 million subscribers in 2025 to 9 million subscribers in this year, 2026. And they're claiming that they would make up the difference by increasing the ad support tier from 3 million to 112 million this year.

Those are wildly large numbers and I'm a little bit confused by all of that.

Dan (1:20:41)
In 2026. Yeah, that's a, that's, pretty interesting.

Shimin (1:20:45)
30X, so you expect to decrease your higher priced subscription, but 30X your lower cheaper ad supported subscription, speaking of in enshittification and ad supported networks.

Rahul Yadav (1:20:59)
And also the ads are new right now. They have to convert because if they don't convert they're not going to be there for long. No one's going to keep paying to post them if they don't lead to conversion.

Shimin (1:21:08)
Right.

Dan (1:21:12)
the click through rate on banner ads begs to differ with that Rahul

Shimin (1:21:16)
Ha

Rahul Yadav (1:21:17)
I guess that's true.

Dan (1:21:19)
When I was involved, well, yeah, last time I looked at those numbers, it was well under 1%.

Rahul Yadav (1:21:27)
Yeah.

Shimin (1:21:28)
All right, Dan, you're up next.

Dan (1:21:29)
Yeah, so this is actually kind of a little bit in the opposite direction. So David Silver, who's a DeepMind alum, just raised $1.1 billion for his ineffable intelligence, which is a British AI lab, which was founded only a couple of months ago. And so they raised at the valuation of $5.1 billion.

And I guess the sort of interesting piece there is that they're, you know, interested in looking at like novel AI models that can outperform LLMs. Cool. But like, uh, to me, this is a clear signal that like funding is not even close to drying up if they're able to, you know, value a company that's a couple months old at 5.1 billion. So, um,

I see it as status quo, floodgates open.

Cash, cash, cash, cash.

Shimin (1:22:20)
Yeah,

makes sense. He worked on the, the go engines and the other deep Q learning projects. So that makes sense that he wants to take on a no human data based approach. Kind of, I'm kind of really curious how it would turn out. And Rahul.

Rahul Yadav (1:22:38)
⁓ this one is from TechCrunch where Scout AI is raising a hundred million dollars to make killer robots, the autonomous military models. and, they're going to use vision language action models. their model that they're working on, ⁓ it's called a fury model where you fury. Yeah. Where you take in.

Dan (1:22:58)
Fury! Goes right in there with Gastown

Rahul Yadav (1:23:02)
To Valhalla. You take the visual input and then that leads to the physical commands that the drones would take. They don't talk about which models are they using, which underlying models, whether they're...

Shimin (1:23:05)
Ha

Rahul Yadav (1:23:19)
American models or Chinese models, but they just say they have agreements out there. And then the they're thinking of resupply as the initial entry point there.

We've already seen this play out a little bit in Ukraine already where there was news about a drone holding a certain slot for a while or taking it over on its own. So some of it is already happening, whether we like it or not. And there's already much more well-funded, well-established companies like Anduril out there that are already also doing things like this.

So add it to a list of another company raising money to, you know, have drones, give drones the agency to take life or death actions.

Shimin (1:24:08)
Yeah, the military industrial complex is also getting in on this AI bubble. That's unlocks a lot of additional capital. think my hot take this could actually be a good thing. Okay. We think about 1984 and how Oceania is always at war with America. forgot what the two other countries are, but, you essentially fight a deathless war where you just.

Dan (1:24:15)
Mm-hmm.

Rahul Yadav (1:24:15)
yeah.

Shimin (1:24:32)
Industrial societies produce these autonomous weapons that fight and destroy and kill each other, but not humans.

So maybe it's not so bad.

Dan (1:24:42)
And then someone trains the model incorrectly and you get Terminator the movie.

Shimin (1:24:47)
Yes,

gave it too much intelligence. ⁓ it's a good time to watch this unfold. Well, all that said, how do we feel? We were at four minutes last week.

Dan (1:24:50)
Yeah

I mean, I thought four was a little generous last week, but given there's no shortage of funding coming in, it's kind of how I'm reading the signal here this week, I'm okay with at least leaving it.

Shimin (1:25:10)
Okay, yeah, I can leave it. Sounds like this bubble still has some room to grow. Well, with, we'll leave that at four minute. And with that, that's the show folks. Thank you again for joining us for our discussion this week. If you like the show, if you learned something new, please share the show with a friend. You can also leave us a review on Apple Podcasts or Spotify. It helps people to discover the show and we really appreciate it.

Dan (1:25:16)
plenty from the sound.

Shimin (1:25:33)
If you have a segment idea, a question for us or a topic you want us to cover, shoot us an email at humans at adipod.ai. We'd love to hear from you. You can find the full show notes, transcripts and everything else mentioned today at www.adipod.ai. Thank you again for listening and we'll catch you next week. Bye.

</details>