Episode 25 · May 12, 2026

Elon vs OpenAI Trial Drama, Billion Token Context Race, Multi-Agent Patterns 2026

elon musk, openai trial, greg brockman, ilya sutskever, openai for-profit conversion, tesla painting, model 3 bribe, anthropic, spacex, xai, colossus one, 220000 gpus, claude code limits, peak hour limits, dangerously skip permissions, orbital data centers, nvidia rubin cpx, disaggregation, billion token context, kv cache, memory bandwidth, network attached prefill, blackwell, rampocalypse, ram prices, phil schmid, multi-agent patterns 2026, sub-agent, inline subagent, fan-out, agent pool, agent teams, gemini enterprise agent platform, geap, dark factory, jack clark, import ai 455, swe bench, mythos preview, metr, task horizon, opus 4.6, core bench, kaggle, kernel optimization, ai alignment, alignment math, recursive self improvement, alphago move 37, minimax self training, deepmind, simon willison, vibe coding, agentic engineering, review fatigue, normalization of deviance, dexter horthy, ai engineer europe, james shore, maintenance costs, code is free as in puppies, refactoring trust, wsj grok losing ground, grok downloads, rc cola, ben pouladian, openai is coke anthropic is pepsi, anthropic google 200b, panthalassa, floating ocean data centers, ocean cooling, wave energy, gpu piracy, two minutes to midnight, ai bubble, revenue backlog

Why is Anthropic now running Claude on Elon Musk’s GPUs — right after Elon sued OpenAI for the same kind of nonprofit-to-for-profit hack? Why does the road to a billion-token context window require breaking the GPU and memory apart entirely? And what does it mean when even Simon Willison reluctantly admits he ships code he hasn’t read and runs Claude Code with --dangerously-skip-permissions by default? Shimin, Dan, and Rahul cover Brockman’s leaked deposition journal (with the Ilya-commissioned Tesla painting Musk stormed out clutching, and the Model 3 co-founder bribes), the Anthropic-SpaceX/XAI deal for access to Colossus One’s 220,000 NVIDIA GPUs, NVIDIA’s Rubin CPX disaggregation architecture and the memory bandwidth wall, Phil Schmid’s four sub-agent patterns for 2026 (inline / fan-out / agent pool / agent teams), Jack Clark on automated AI research at 60% by 2028 and the alignment math that makes 0.1% misalignment compound to 60% accuracy in 500 generations, Simon Willison on review fatigue and the normalization of deviance, Dexter Horthy publicly recanting his dark-factory agentic-engineering stance, James Shore on why a productivity doubling without inverse maintenance gains hits a velocity wall within 12 months, and the Wall Street Journal on Grok’s collapse to “RC Cola” status — 20M to 8M downloads, 0.174% paid penetration vs 6% for ChatGPT.

Takeaways

Resources Mentioned

Chapters

Transcript

Show full transcript

Shimin (00:00) Hello and welcome back to Artificial Developer Intelligence, a weekly conversation show about AI and software development. We go through hundreds of links and dozens of newsletters each week, so you don’t have to. My name is Shimin Zhang and with me today are my co-hosts. Dan, the podcasts are deeply personal conversations that were never meant for the world to see or hear, laski. And Rahul, maximum truth-seeking and less woke.

than his competitors, Yadav.

Are your weeks going?

Rahul Yadav (00:30) I do not like that reference.

Dan (00:32) Pretty good.

Rahul Yadav (00:32) That was grok, wasn’t it? Thanks, I hate it.

Shimin (00:37) Hahaha

Dan (00:37) You

Rahul Yadav (00:38) What is Gemini or whatever? We’ll never know.

Shimin (00:40) ⁓ this is good.

It’s gonna be a pretty Elon heavy show this week, which is gonna be fun. And of course, as always, we’re gonna start the show with news, Threadmill, where we’re gonna talk about Elon’s battle with OpenAI and Anthropic’s deal with SpaceX, or I should say XAI.

Rahul Yadav (00:44) yeah.

Dan (01:00) then next.

true. Or I think XAI is not long for this world actually, so you’re close. But in any case, next up we will have Hardware Hunt, where we’re going be talking about the road to a billion token context.

Shimin (01:05) Alright.

Then we have a quick technique corner about multi-agent patterns in 2026.

Dan (01:19) And then for post processing, we actually have a handful today. So when we’re talking about import AI 455 AI systems are about to start building themselves. Vibe coding and agentic engineering are getting closer than I’d like. And last but not least, you need AI that reduces maintenance costs.

Shimin (01:37) And we’re going to end the show as always with two minutes to midnight where we’re going to talk about the current state of the AI bubble. So.

Dan (01:45) further adieu. Should we talk about Elon? Rahul’s favorite topic. Weirdly I actually made ⁓ an XAI account this week because I was like I’ve just never used it so I’m like I want to see how terrible this thing is and like it wasn’t as terrible as I thought but like it’s definitely not. I didn’t try it for coding I was just like general queries but

Shimin (01:48) Let’s do it.

Whoa.

Did you ask the X AI model opinions on wokeness or ⁓ maybe gender theory or?

Dan (02:12) No, I very carefully

avoided all even remotely controversial topics. But you know, I still have the account, so it’s not too late. can do that. Yeah, one for the ages, probably. Yeah. in recent AI, I don’t know if I’d call this news. It’s like, yeah, drama.

Shimin (02:22) That’s a vibe and tell for the ages.

drama.

Dan (02:37) They’ve started litigating, like the actual court trial started for Elon suing OpenAI over the split where they tried to basically convert the nonprofit arm into like a fully for-profit company. And I believe his claim was that it should be somewhat ironically given the previous thing we were just talking about, he’s suing them because he claims that

It was meant to be for the benefit of all humanity, but instead it’s for profit. So there’s been all kinds of like insane stuff coming out of the trial so far. And this article that we read just kind of like covers a couple of the highlights. And one of the more interesting pieces in it was essentially unlocking some of the drama behind like the whole

split too, like you see some of the lines forming around like Sam and all that old stuff that went down earlier too. yeah, Elon basically like demanded full control over the company and then apparently tried to like essentially bribe everyone with a free Model 3. So he gave all the co-founders a free Model 3. ⁓

Shimin (03:39) Mm-hmm.

If only

he had given them a cyber truck. I think everything would have gone very differently.

Dan (03:48) Yeah,

or a Model Y. mean, you know, it’s like, aren’t Y’s more popular than 3’s? But yeah, but in any case, so Greg Brockman apparently saw right through that. And the other weird detail too was that apparently Ilya commissioned a painting of a Tesla, which he then gave to Musk during the meeting as like a gesture of goodwill.

So then the other deal that just cracks me up is like there’s a couple other things we’ll get to but like as he’s as Elon’s leaving the meeting he like storms out with the painting. I just imagine him like fine I’m taking this with me. It was really amazing. ⁓ Yeah no and yeah so needless to say it did not go well.

Shimin (04:22) Well, the meeting didn’t go well.

why don’t we set the table a little bit, right? ⁓ For listeners, if you have not been following this drama, OpenAI initially started as a not-for-profit. guess technically it still is. But the mission was to bring open source, open AI to all of humanity. And Musk was one of the largest contributor to open AI at the time. I believe he gave him half a million, or is it?

Dan (04:29) Okay.

Shimin (04:50) five million. It was like a significant chunk of their initial seed fund before the non-profit for-profit split when Microsoft then gave them hundreds of millions of dollars to access the model. So this trial is about when Musk kind of lost control of the entire charity and now he’s fighting for it.

Dan (05:13) Yeah. So specifically in the incident they’re talking about in the article, like apparently Musk like demanded full control over the, the entire entity. Brockman and, and Ilya both said no essentially. so when Brockman said I decline, Musk’s reply was, when will you be departing? Just like kind of an amazing comment. but they didn’t leave and then eventually Musk did leave.

and he essentially stopped donating to the operating budget too.

Shimin (05:40) Right. He left the board and it’s interesting that OpenAI had the same office as Neuralink until 2020 until the pandemic, which is surprising to me. And as part of this deposition, we got a chance to read Greg Brockman’s journal. That’s, that’s kind of the, the really awesome, ambulance chasing aspect of all of this. Like we get to read, my favorite part of.

Dan (05:54) Yeah, that’s right.

Shimin (06:02) his journal entry was, you know, he talked about how poorly the meeting went, but, financially what mattered to Greg the most was what will take him to a billion dollars worth of net asset. So, it’s interesting. You hear about the charity stuff on the surface, the good PR, but then when it comes down to it, it’s between you and your journal. It’s, like, what would give me to one B now, maybe the B is not billion of dollars, maybe

One B is something else, but I’m skeptical.

Yeah.

Rahul Yadav (06:33) They, it wasn’t in the article. There was also this in that Infinity Machine book about DeepMind. Cause you know, Elon was the reason why they started Open Air was because DeepMind being the company that Google was, I think Google had already bought it at that point or was pretty close to buying it.

So there is an all hands that happens and the same thing comes up where Elon’s like, maybe Tesla can absorb open AI so that it can fund what open AI needs to do. And then some like some other employee wasn’t in Brockman ⁓ who just says, but how would that be any different than Google owning DeepMind? And then Elon calls him a jackass in the all hands and then just walks out of the room.

Shimin (07:06) Mmm.

Rahul Yadav (07:16) So the whole paint, I’m taking my painting with me, checks out.

Dan (07:19) Nice job, jackass. Now give me a painting.

Rahul Yadav (07:21) Yeah.

Shimin (07:23) Yeah, this whole situation almost makes OpenAI seem like maybe not the good guys, but not the worst guys in the room here.

Rahul Yadav (07:31) Yeah. Elon’s argument that you shouldn’t be able to start a nonprofit and then turn it into a for-profit once you’ve hit gold does make sense. you know, so I think Elon being Elon, all the drama, all these things, what they’re arguing does make sense, because otherwise it would be a big hack.

Dan (07:50) Yeah,

you do have to concede the actual point even though if clearly it’s being used, it’s just like an effective argument for some drama.

Rahul Yadav (07:56) Yeah.

Shimin (08:00) It’s like that the worst person, you know, just made a very good point meme. not saying Elon is the worst person I know, not by a long shot. ⁓ But that kind of does remind me of that. Well, in other Elon news, this week it came out that Anthropic raised a ton of additional compute from SpaceX.

Rahul Yadav (08:05) Hahaha

Shimin (08:20) SpaceX slash XAI since XAI is a part of space SpaceX now we’re just gonna refer to it as SpaceX going forward and as a part of this Anthropic up its token limit per minute for all of the Pro and Max subscribers and remove the peak hour limit that reduces Claude code usage. So personally, this is great news to me.

Dan (08:42) Token maxing Shimin

Shimin (08:43) The

compute that Antropic got was from SpaceX’s Colossus One supercomputer. And this with over 220,000 NVIDIA GPUs. Sometimes I just forget the sheer size of these data centers. And this is a data center that had the

I believe was a gas turbine running outside of the data center without EPA oversight. But at the same time, you know, it got it set up in record time. this is the paradox of Elon, I think at the end of the day is like on the one hand, his behavior is sus, but on the other hand, he can really get some pretty impressive stuff done.

Rahul Yadav (09:08) Yep.

Dan (09:27) But you do have to wonder how well XAI is doing if they’re loaning out capacity, know, instead of using it to train or using it for inference.

Shimin (09:33) ⁓ yes.

Yeah, listeners, this is called foreshadowing.

So this is just-

Rahul Yadav (09:42) Also, since they’re

trying to IPO, think anything that can show up, we show up as profitable instead of a loss is a short-term win until they get the public’s money.

Dan (09:55) Probably true.

Shimin (09:56) And they’ve also been signing deals with Microsoft, Google, Amazon, and Nvidia in the last couple of weeks too. ⁓ Anthropic just was not prepared for the amount of demand they were going to hit. And now they’ve been in a serious compute crunch that hopefully they are coming out in the other end. In spicier news, it also came out as a part of this deal that

Rahul Yadav (10:02) Yeah.

Shimin (10:17) Anthropic is interested in the orbital data centers that SpaceX is ⁓ planning to launch. jumpy to me, but.

Rahul Yadav (10:26) Are they

interested or yeah, expressed interest is what the phrase is there.

Shimin (10:31) Mm-hmm.

Dan (10:32) I mean, who’s not in there? I mean,

I’m very interested in them as well. How they’re going to do that when there’s a pretty significant tonnage gap between launch capabilities they have today and what would actually be required to do something like that.

Shimin (10:44) Yeah, more barely interested.

Rahul Yadav (10:45) Or maybe that was the way to like…

Dan (10:45) I mean, don’t you find that interesting? I find it interesting.

Rahul Yadav (10:50) But

Dan, how much do you want to express your interest? How much compute are you in the

Dan (10:56) I mean, I’ll take one GPU in space.

Shimin (10:58) Well, Dan’s not spent, you haven’t spent a single dollar on those orbital compute yet, just like Anthropic. So I guess we’re all very interested.

Dan (10:59) be there.

True. We’re equally interested.

Rahul Yadav (11:07) It is funny in that Dwarkesh podcast, Elon was calling them misanthropic instead of anthropic. then, ⁓ you know, their money is as green as anybody else’s. So as soon as you show up with, look at how much money I can give you in, these tough times. He’s like, great, let’s do a deal.

Shimin (11:15) Mm.

Very pragmatic, that Elon guy.

Rahul Yadav (11:30) The same principle the man who’s standing up against open AI for us poor folks.

Shimin (11:36) Yes.

Well, on my end, I’m just happy that my Claude code limit have been lifted. speaking of which, Dan, is it my understanding correct that it’s a very special day for you?

Dan (11:48) Not yet, but it will be soon. No, I just got an email from Enthropic that your one year subscription is expiring. Don’t forget to renew. So it’s been almost the biggest June. It’s been a year since I’ve had a clogged code subscription.

Shimin (11:54) Uhhhh…

What is the one year anniversary present? Is it like a paper anniversary?

Dan (12:05) I might, well,

I’ll admit to something in the podcast, which is that I signed up through iOS and I just did a year flat because it was slightly cheaper. And that has kept me from really like playing around with like max five or 20 or whatever too, because, which I think is maybe not the worst thing, honestly, but I think the present to myself will be switching it over to.

Shimin (12:13) Mm-hmm. Mm.

Dan (12:30) monthly credit card base so if I have something worthy of max I can crank it up for a month and

Shimin (12:36) Yeah, come to the dark side. I am at 38 % of this week. So I have three days to use the rest. OK, onto our hardware, hut Dan, you’re going to tell us exactly how we’re going to go increase our model context. A billion.

Dan (12:50) get to a billion tokens. Well, don’t want

to bury the lead. turns out that the secret to doing that is disaggregation. But let’s talk about how we get there and what that means. Cool. So this is a pretty interesting article from ACM, which is one of the industry organizations for sort of like hardware folks. Weird fun fact about me as originally.

hardware major, so that’s why I’m interested in this kind of stuff. ⁓

Shimin (13:18) I was so surprised when you submitted this. Like I was like, Dan’s not fucking around.

He’s submitting his ACM papers to us.

Dan (13:25) It’s not my favorite, but yeah. So the part that’s kind of cool is that this talks a lot about the new recently unveiled Rubin CPX architecture, which is the next class of architecture that Nvidia is going to be unleashing upon the not so general public pretty soon here.

They spend a bunch of time talking about like what is constraining hardware for today’s purposes, particularly on inference. And I don’t know how much, I kind of learned a lot about like KV stuff from when you were talking about it, Shimin like how that all works, but I didn’t realize that essentially as your context length grows, almost every byte you add to the end of the context window,

means like almost gigs of throughput happening in terms of like, you know, in a distributed sense, like if you’re doing, for instance, something like, you know, Anthropic or any of the large companies, it’s not just one GPU serving your tokens, right? It’s a huge interconnected network and they’re using probably like, NVIDIA’s interconnects too. Like that’s the other weird part about the hardware business.

Everybody thinks about them as like purely a GPU company, but like their networking is doing almost as well, if not better than GPU sales. and so that, that one, you know, byte being added to the end might is not necessarily going to fit into just like that cards, you know, context window. So it could be shuffled around the network a whole bunch. So people try to do all kinds of tricks to like localize the cache to the,

cards that are actually processing your query or batch or whatever they do, you know, trying to do things like SSD caching and all of these have drawbacks and a lot of those drawbacks contribute to sort of something we’ve talked about on the show, which is like when you hit that, you haven’t hit the context limit yet, right? Like even on like a 200,000 token one, for example, which is quite large when you think about it.

But we’ve talked about on the show having like that sort of 60 % drop off, right? As you start getting there. And it turns out that a lot of that drop off is actually caused by memory bandwidth limitations. Because when you start reaching that threshold, you’re now relying more on cache than you are on like live memory. And so that’s when things like cache misses start messing up the quality. so.

Shimin (15:41) Mm.

Dan (15:42) because your prediction gets messed up because it’s loaded the wrong data or no data at all and had to essentially get. So that’s kind neat. So how is NVIDIA solving this with their new architecture? So the main point to take away from it, and I’m sure we’ll find out more details as more gets released, but they’re using something called disaggregation. So essentially what that means is today, you really want to glue the fastest memory you can.

Shimin (15:47) Yeah.

Dan (16:06) It’s close to the computer as you can. You’re seeing things like unified architecture, right? ⁓ For like, that’s why these Apple machines are able to do so well at inference because you’re basically getting a ton of memory bandwidth by having the memory literally sandwiched on top of the chip. But with this, they’re actually taking a slightly different approach, which is the memory modules will be like essentially on separate chips. And you can.

stack tons of it up relative to a single GPU. So instead of focusing like purely on overall compute, which is kind of what they’ve been doing with like Blackwell and previous architectures, they’re starting to look at, well actually for inference memory matters more. And then the second thing they’ve done as part of the disaggregation that’s interesting is they’re hooking it up closer to the network stack. So you can actually start pre-filling the contents of like

the next compute run essentially ahead of the stuff that’s happening. So you can go basically directly from network into memory and you don’t have to worry about like, you know, going through any part of the CPU to do it. So that’s hopefully gonna the combination of those two things they think will set up the eventual ability with that current with that new architecture to probably hit 1 billion context windows and

Additionally should make inference even at smaller context windows, less lossy, like less token drop off or, know, like token drop off. What am I saying? Less quality drop off at the, at that threshold. because in theory you’ll have to rely a lot less on the tricks that they’ve been doing today. So pretty exciting. And it’ll be neat to see how much this actually sort of impacts our day to day usage when this, you know, gets rolled out in the field. Keep in mind that’s a while, you know, they’ve got to build new things.

It takes quite a while to stand up at data center and get all the interconnects running. It’s actually kind of astonishing how much hardware dies in the process of doing that, like how many entire machines they just lose.

Shimin (17:58) Yeah. And that on my end, since thank you for that deep hardware dive. Most of that like flew right over my head in the, the back wall there. I’m just nod and, and trying to catch a little bit. do, but there’s prediction that we’re moving from a training heavy setup to a inference, optimized, efficient inference GPUs make, does make a lot of sense to me. I do wonder how we’re going to train models with

Dan (18:05) no.

Mm-hmm.

Shimin (18:25) a context window that large. that is larger than, you know, all training data that we had circa like 2015, know, like it’s, it’s kind of, yeah, just mind boggling to think about that. And, and of course, uh, we do talk about, um, models being less efficient once you’re filling up that context window. at a billion tokens, like is it even

Dan (18:35) You

Shimin (18:52) all that usable. We’ll find out. But yeah.

Dan (18:53) Yeah. Well, but

the other interesting possibility is like, you know, today we’re doing all kinds of like rag stuff that’s optimized for like, you basically do vector embeddings, right? And then figure out the similarity between that and the query and then return the relevant chunks of whatever you’re searching through. What if you didn’t need to do that anymore? You could just put all of the PDFs directly in a billion token context. So you might get pretty,

Shimin (19:03) Mm-hmm. Yep.

Dan (19:18) Pretty great retrieval.

Shimin (19:18) Yeah, and like in

2065, like every single person would have a phone sized device that contains their entire digital life that they can query. That’s the Star Trek future I’m here for.

Dan (19:26) you

be

pretty cool. You’re also carrying around enough compute in your pocket to like heat your entire house.

Shimin (19:36) Well, battery technology will have caught up by then too. Yeah, it’ll be super safe.

Dan (19:38) Yeah, sure.

That’s probably harder than all the context stuff put together.

Rahul Yadav (19:44) So just so that I’m clear, these GPUs can, they’re more optimized for inference than training. Like does this mean there would be less training in the future?

Dan (19:53) correct.

⁓ not necessarily. It’s just like that style of compute would not be as efficient for, cause I think compute matters more than memory for training to some degree. Right. but I think bandwidth is still pretty important for training. like it may not be as big of an impact as they’re claiming to you because if it’s true that you can do like using the interconnect stuff more efficiently than.

Rahul Yadav (20:08) Uh-uh.

Hmm

Dan (20:22) It might actually improve training too. But I think it’s like the idea too, is that because the memory is now decoupled, you can have essentially a module that’s like fitting into your server rack. That’s basically just RAM for the unit above it. That’s the, you know, single set of GPUs like six, right? Whereas previously you’d have like, way less overall memory associated with each compute unit. So it was like that.

Rahul Yadav (20:31) Hmm.

Hmm.

Dan (20:48) balance is more efficient for training than for inference. But for inference, I think you want more just like disconnected RAM floating around, know.

Rahul Yadav (20:51) I see.

Yeah,

I see. Okay.

Shimin (20:59) Yeah, the DRAMs are not gonna get any cheaper anytime soon. Goodbye my PS5 dreams. ⁓ all right.

Dan (21:02) Mm hmm. Rampocalypse. was actually reading side

Rahul Yadav (21:07) ⁓

Dan (21:09) as a random aside, not that random. ⁓ bringing another article saying some folks saying, Rampocalypse is going to last another minimum of another year to until folks are able to ramp up. Cause it’s like, you have to build new factories to be able to keep up with this demand and it’s going to take a while to do so. So

Shimin (21:26) Yeah, listeners, buy some RAM. That’s what I should have done this time last year.

Dan (21:27) Sucks. Yeah, it’s worth more

than gold. And now I was talking about my friend who had just upset him with a buddy and like, yeah, so I’ve got 128 gigs of RAM and he’s like, how did you afford that? like bought it before the ramp up for the ramp up clips.

Shimin (21:39) Ha ha

Yeah, no computer upgrade for me anytime soon.

Dan (21:45) at least gone up

50 % I think in terms of price. It’s wild.

Shimin (21:48) Crazy. ⁓

On the, I guess, lighter note, but also optimistic, for our technique corner this week, we have another article from Phil Schmid who is a part of Google DeepMind. And in the article, he called out four agent, sub-agent patterns that are used in industry as of 2026.

And we can kind of map them to some of the tools that we use every day and we’ve experimented with and see how that shapes out. the first pattern is the inline subagent as a function call. This is your basic, you know, Clauide Code chat agent format, right? Like you asked Claude Code, hey, go kick off a subagent to do a thing and then come back to me with the data.

And sometimes a lot of the code harnesses these days would kick off subagents automatically. So this is the common, I guess relatively common, subagent architecture. The second pattern is the fan out where you, you know, as the name suggests, kick off multiple subagents and then do a map reduce, get all of their data back. And then maybe they’re summarized it or do something with that output.

This is, I think we can see it as the Claude Code swarm mode, or maybe even the superpower does that a lot of times. It does a subagent fan out if the data can be, or the work can be done in parallel. So with some architecture, you can, you do this too in your coding harness today. The third agent pattern is more interesting. This is the agent pool.

where you have multiple persistent agents that are long lived. then you have a main agent that can send follow-up instructions, check status, and coordinate work between agents. This is, in my experience, similar to what Gastown’s mayor does. You talk to the mayor. The mayor goes around, looks at how each agent is doing, gets some of them unstuck.

and report back to you when some goal is achieved. What is like really interesting to me is once you get to this pattern three, these long persistent agents with their own state, it becomes extremely costly to evaluate how good the entire system is. Cause now you have to control for state. You have control for the, ⁓

overall the main agent’s interaction with the subagents. have lots of moving parts and the whole system becomes more more dynamic and things can go wrong catastrophically. And then lastly, we have the agent teams. This is a dark factory pattern that everyone is trying to achieve where you have agents directly talking to each other without a main coordinating agent.

Right. They had, they, they can do message sending via either direct message. is kind of like the Claude code channels approach, or they can use a mailbox to pass message back and forth without going through a central hub. I don’t think I’ve really come across this pattern too much in the wild. I know something like a agent dark factory would use, we use this.

Working on a couple of demos, uh, with Claude Code to try and see what the limitations of this pattern is. Uh, it’s clearly going to be even more difficult to evaluate and, um, observe, right? Like how, how do you even test the interaction of a, uh, dynamic agent ecosystem where you cannot control for when each agent talks to each other and what they will say.

But I do think this is probably where things are going in four to six months. This is going to become kind of the standard worker replacement version of agents. So this, of course, being from a deep mind,

DeepMind employee, there’s a lot of sample code for how to achieve these patterns in one of the Google SDKs.

Rahul Yadav (25:45) Gemini Enterprise Identic Platform, I think.

Shimin (25:49) Right, definitely try this

Dan (25:50) No, they only paid for one episode.

Shimin (25:50) out in the gym.

Dan (25:52) Why are you trying to give them two? Jokes, jokes.

Rahul Yadav (25:53) Hahaha

Shimin (25:55) Yeah. Try out in GEAP maybe. Gemini, agent tech, enterprise platform.

Rahul Yadav (26:01) Enterprise Agentic

Platform.

Shimin (26:04) What are they paying those designers out there? So anyways, I found this article to be a good summary, but also quite insightful as to where our agent ecosystem will move to. Excited to see what the first consumer grade, like what is the open claw version of this agent teams that will blow up.

I think it will happen. will probably happen within the next 60 to 90 days. It will burn through a crap ton of tokens though.

Dan (26:29) as one does.

Rahul Yadav (26:30) This makes me think of how a lot of the org design we used to do was to solve problems. Like you would set up the teams to solve problems.

in a certain way. ⁓ You would have feature teams, would have platform teams. Sometimes people for some reason would put all the back end people in one team in front of people in another team and then you would see chaos all over the place. every time like when I see this there is value.

Shimin (26:42) Mm-hmm.

Alright.

Dan (26:55) I know nothing about that.

Rahul Yadav (27:00) in learning about org design because when I see these patterns it’s about applying those old org design principles but now you can

Before, let’s say you put a team together, you couldn’t really every single day move the teams around, right? Because then people are people, it takes time, you have reporting structures and all that. But with this, can literally, based on what the problem is, you can create the team instantaneously, and the agents are not going to be mad about it. You can set them up however you want. And I think you can also create a

higher level agent that knows all these patterns and then based on the problem then spins up agents that follow that pattern versus right now if you look at every section it says when to use this but you can actually have an agent also determine this and you don’t have to figure out in which cases should I be using what pattern because you would end up in any complex job you would end up using a mix of these patterns you would use the single pattern every single time.

Shimin (27:44) Mm.

Rahul Yadav (28:04) So could even take it one level higher.

Shimin (28:07) Right. And the agent pattern does not need to be static, right? It could be dynamic as, yeah.

Rahul Yadav (28:14) Exactly. here’s

a list of it’s similar to you know you define tools and everything you can also say before you even go down that path here are some predefined patterns but you could also come up with a different pattern and then based on that it spins up the pattern first and then you implement you come up with the plan.

and everything. So it could be another thing that sits in your top level agents MD that it can refer to first to set up how it would go to attack the problem.

Shimin (28:39) Mm-hmm.

Dan (28:43) But if you sort of read between the lines, what I’m actually hearing is Raul wants to use agents to get rid of HR because he just made reorgs a lot cheaper.

Shimin (28:49) Ha

Rahul Yadav (28:53) My throat’s been pretty scratchy this week, I’m sorry.

Shimin (28:57) well, it’s their agents, not people. It’s okay to mistreat your agents, guys. ⁓ Just kidding. No, don’t do that. So I tried out, know, it’s funny, sidebar I I did the whole, you know, ask two sub agents to check the work of a task and whoever does best gets a cookie. And then I told the main agent to ⁓ give agent B a cookie and it did. And now I feel like I’m no longer lying to my agents. I’m not gaslighting them. I gave them.

Dan (29:01) Not according to Anthrophic.

Shimin (29:22) a cookie. And they were happy about it.

Dan (29:24) What kind of cookie was it?

Rahul Yadav (29:26) it

Shimin (29:26) It was a chocolate chip cookie because that’s what the emoji said.

Rahul Yadav (29:29) ⁓ Agent B slowly develops the Cookie Monster personality and starts talking like the Cookie Monster.

Dan (29:30) It was in the book.

do work unless it’s giving cookies.

Rahul Yadav (29:40) Hehehehe

Shimin (29:41) All right, onto our post-processing segment today. First up, have the AI systems are about to start building themselves from Jack Clark.

Rahul Yadav (29:52) from hopefully friend of the podcast, Jack Clark. If not, why not Jack? Jack is one of the co-founders of Anthropic. I think this is based on a talk that Jack gave recently. What the article calls out is there are a bunch of patterns that have developed in

how AI models are built and how AI research is done that are all coming together and by Jack’s estimate 60 % probability that by the end of 2028 all of these places would cross a point where you can automate AI research.

Jack then goes through a few of these different or all of these different capabilities that he sees that would lead to that tipping point. First one being ⁓ SWE Bench, is now Mythos Preview, according to this post, is close to 94 % and considering error rates and everything, can consider that ⁓ benchmark being saturated, which means that Methos has accomplished whatever that

benchmark has to offer. That one specifically is a proxy for you have a bunch of GitHub issues that represent real-world issues and if the models can do that then it’s a good proxy for the models being able to do software engineering jobs.

The second one we’ve talked about a couple of times in different shapes and forms in the podcast is the METR the task horizon and how long it takes for models to do tasks at this point. Now we have Opus 46. Don’t know if 47 went any higher what Mythos said, sits at, but 46 is already at 12 hours of that

ask for ⁓ length horizon. I think it’s still at 50 % reliability. ⁓ The article doesn’t call that out. So one way to look at it is coin toss. It may do it, it may not, which then goes back to our article by Toby Ord of maybe you just throw a human at it and you’ll get 100 % reliability. But you know, yeah.

Shimin (31:47) Mm-hmm.

That’s very optimistic about human capabilities.

Rahul Yadav (32:05) But yeah, the length of tasks that AI

keeps, can accomplish independently, keeps climbing. And back in 2022, it was at 30 seconds, and now we’re looking at 12 hours. So if you extrapolate that to 2028, we’re in human level territory, we’re already in human level territory, but it’ll keep pushing that forward. And hopefully reliability goes up too. As they…

As these systems get better, don’t even have to wait until later, 2028. They can keep helping automate different chunks of the AI R &D. It’s also getting good at some of the core science skills where you define a hypothesis and it’s a lot of trial and error that you do to be able to figure something out. And it can speed all of that up because it’s not moving at human speed anymore. It would be moving at compute speed.

the speed of the AI itself, it can learn from it, then set up another experiment, and then keep building this self-reinforcing loop. Another capability is this ⁓ core bench, which I wasn’t aware of, basically taking a scientific paper and reproducing its results. And so that currently is sitting at 95.5 %

with Opus 4.5 accomplishing that and that was from December 2025. So assuming you know more progress there it’s.

probably also considered saturated at this point. They can also build machine learning systems for Kaggle competitions. One interesting one was ⁓ kernel optimization, where you’re really going down to the kernel level to be able to make changes and optimize things so that you can make your training and inference both more efficient. And that is also where that…

is getting a lot of attention and a lot of benchmarks have gotten set up where people are going after that very intentionally. Fine tuning models is another one. I think we already have seen that one because that’s how we see more of the dot, you know, minor version bumps in Opus versions and GPT versions. So we can assume that’s going to keep happening.

a lot of stuff in training as well that the models can focus on. Anthropic was focused on small language ⁓ training models and or small language models and training them and it was able to use

It’s other models to be able to optimize that. Now, can you do that at scale for a large language model is the big challenge to solve here. So we’ll see how that goes. But according to this, sorry, good.

Dan (34:44) But I thought one of the

we were just talking about one of the Chinese companies and claimed to have done this right

Shimin (34:50) Yes. Forgot the lab. No, it was not DeepSeek.

Rahul Yadav (34:51) deep seek or different.

Dan (34:53) Yeah,

wasn’t DeepSeek Minimax maybe? Yeah, yeah. So, seems plausible.

Shimin (34:56) It might have been Moonshot. Minimax, yes, it was Minimax.

Rahul Yadav (35:00) So maybe just needs to be copied over in entropic

world. Yeah. And then one of the big things that Jack calls out here is the AI alignment research. alignment is a pretty big problem because…

Even if you have 0.1 % misalignment, and we assume that AI, self-reinforcing automated research becomes a thing, by the time you even catch it, it might have run through thousands of runs. you go from 0.1 % misaligned to now you’re, which 0.1 % error, and it’s 99 % accurate to now you’re looking at after 50 generations, you’re looking at 95 % accurate and after.

500 generations you’re looking at 60 % accurate and so it can go from aligned to misaligned very quickly as we automate this and there isn’t I don’t know what a solution for that would be other than your human bottleneck moves at that point.

Shimin (35:59) Right, this is like the OpenAI’s goblin pattern from last week. Except instead of goblin, it’s to destroy humanity.

Rahul Yadav (36:03) Hehehehehe

Dan (36:07) I also just read another one that I don’t think made the cut, it was pretty funny. was some of the misalignment is coming from fiction about pre like LLM AIs.

Rahul Yadav (36:07) Yeah

Shimin (36:19) Mm-hmm. Yep. I’ve read the same, yeah.

Rahul Yadav (36:19) Mmm.

Dan (36:21) yeah, I just thought that was fascinating. Like, you know, I’m sorry I can’t do that. Dave is like literally part of the training set.

Rahul Yadav (36:29) Yeah, AI alignment and then also what we just talked about being able to manage other AI systems. we already just talked about this as well, how agents can manage other agents. So as we get that more and more accurate, that also would be another skill that would feed into this.

One of the things that Jack calls out is AI research more like you discover general relativity for the first time, or is it more like putting Legos together? And currently his take is that AI is not going to be able to invent new ideas, but it might not need any of that to be able to automate its own development. can run through

The possible search space of all the things that we can do is so large that even if it was through that, there’s a lot of things that it would be able to do. And then you still have humans for pure new inventions that they’re contributing, but a lot of this grand work of literal trial and error on every single scenario in all the different configurations, AI would be able to set that up, try that out, and then gather results much faster than we would be able to.

at human rate. There are problems we just cannot even or all the scenarios we cannot even go after if we move at our pace.

Dan (37:44) Isn’t that basically gradient descent? Right? But being done by LLMs, which is kind of funny.

Rahul Yadav (37:46) Yeah, like if it gets stuck in a… Well…

Yeah, that’s

Shimin (37:53) Yeah. mean, gradient

descent in like some sort of an idea space where you try in different directions and get a little bit closer each step at like a meta level.

Rahul Yadav (38:02) Yeah.

Dan (38:03) Yeah, but it’s just funny that it’s sort of shaped the same in my brain as the processes that got us

Rahul Yadav (38:10) Yeah, one of the things that was interesting that Jack called out was that move 37 that AlphaGo had created that still hasn’t been there. There’s no other.

It’s been 10 years and it hasn’t been replaced by some other like impressive move. So there is something to that where it came up with a genuinely new move. again, it was part of the search space because, you know, of all the possibilities that you could have in the game of Go. But no one else had ever come up with that before. So we could get a lot of different things like that here as well.

And then finally, just like putting all of this together, as some they can write code themselves. We talk about that every week on this podcast. And you can just let them give them a task and let them have at it for as long as you want them to. It doesn’t make sense for humans to do it. They’re getting better at doing those tasks for a long amount of time. And as you point them to improving themselves, it creates this recursive loop or up and down the stack. can keep them.

improving how AI research is done. They can manage each other as well. So you can do a lot of this stuff in parallel. then they, because the search space is pretty wide and we can’t go after everything they can even through brute force do solve a lot of these problems that we wouldn’t be able to do. And then finally, literally everybody’s saying this except well, except for Grok, which is

I don’t know what it is saying these days that is not, yeah.

Shimin (39:44) Grock is

fighting for free speech.

Rahul Yadav (39:47) Except for Grok who’s busy fighting for free speech. Everybody is saying they’re going to automate AI research. OpenAI says they want to do it by September of this year. Anthropics actively, you know, working on it. Jack works at Anthropics. So a lot of these views, I’m assuming, are also influenced by what he’s seeing internally. And then DeepMind has also said the automation of alignment research should be done when feasible.

Everything that AI is going to touch is going to get a great, ⁓ you know, massive productivity multiplier. But just because something is more productive doesn’t necessarily mean that society would benefit from it. So that’s something that’s a human problem we have to intentionally solve. Because otherwise you could easily define that all the compute goes towards this and then

you you hog up compute that could have gone to other things as well. when we were talking about the Infinity Machine, how, you know, Alpha Fold took very little compute and had such a big gain. Now we’re in the world where a lot of computers going towards what at the end of the day are chatbots. And then finally, ⁓ capital heavy and human light.

economy. if something that can improve itself and keep working on itself, human labor, you start questioning like, where are the cases you really need that versus today we obviously need more human labor. In the future, you’ll be able to throw money at it and be able to push that much further than what you can do with human labor today. So

Given all that, Jack puts the possibility of this happening by 2027 at 30%, by 2028 at 60%. It’s still pretty close to somewhat of a coin flip and maybe we run into things that are like fundamental.

you know, bottlenecks, that’s something we have to have fundamental breakthroughs for. But overall, the patterns look like we’re going to a place where we would be able to brute force our way through a lot of these things and really improve how AI research is done over the next couple years.

Shimin (41:48) Maybe the people don’t yearn for AI. Maybe they yearn for ⁓ thermonuclear exchanges at this point. Cause if, if this is, I know this is a Dan look surprised. one of the things I, I spoke about when we first talked about the, the mini max, self training was just like fundamental phase shift from labor dependent, PhD dependent AI research into

Rahul Yadav (41:54) ⁓ god.

you

Shimin (42:14) capital intensive AI research and given our track record of Capital consolidation, maybe I’m wearing American color lenses here That does not necessarily bode well for societal stability ⁓ So that maybe that’s why maybe that’s why the the people will you know for a thermonuclear exchanges not saying me but it might be time to like, know, maybe read a book on capitalism to see

Rahul Yadav (42:27) Hmm.

Dan (42:35) Ha ha ha.

Rahul Yadav (42:39) Hehehehehe ⁓

Shimin (42:40) ⁓ to see where things

may lead.

Dan (42:42) Do remember when this podcast was about ⁓ tool AI tools and now somehow it’s about nukes? It’s interesting how things change. Yeah.

Rahul Yadav (42:48) That’s the world we live in.

Dan (42:51) 30 episodes in and we’re ⁓ ready to fire nukes at each other.

Shimin (42:55) Not at each other, at some imaginary data centers in the middle of nowhere.

Dan (43:01) How do you know

I don’t live in an imaginary data center?

Shimin (43:03) Yeah, honestly, like this seems like such a fairy tale even six months ago when we started this podcast, right? Like now we’re talking about like it’s a real possibility and the signs are there if you look for it. So where would episode 60 lead us, guys?

Rahul Yadav (43:04) Hehehehe

bit.

Yeah, an interesting…

Dan (43:22) You

if Shimin hasn’t fired nukes

at us before then.

Rahul Yadav (43:25) An interesting thing was, yeah, right here at the last sentence in the second paragraph from the bottom, where Jack’s calling out like, or we run into some fundamental deficiency within the current technological paradigm, and it will require human invention to move things forward. Similar to the…

You know, we talk about the rich being becoming more and more rich and then the difference between how much money people have. There could be a world where either you’re so smart that you can operate with these things or tough luck, right? ⁓ Because if all the metal gets eaten by AI that can do long horizon tasks and can do all these things.

Shimin (43:59) Mm-hmm.

Rahul Yadav (44:08) not being smart will be pretty taxing. It already is in today’s world, but it would only get worse if intelligence becomes the currency to be able to do these things.

Shimin (44:19) Good thing I’m growing a lot of tomatoes. AI’s not coming for that yet.

Rahul Yadav (44:21) Yeah.

Claude did grow a tomato in that space experiment. don’t know if you guys remember. yeah, for 30 days or 90 days if the model was just figuring things out and running it and they grew a tomato.

Dan (44:24) yet.

Shimin (44:29) No, darn it.

That’s pretty impressive.

Dan (44:42) and some metal cubes.

Shimin (44:43) Yeah. well back to the AI tooling and coding world.

Dan (44:47) Yeah. So, ⁓ everyone’s favorite Simon Willson, has a recent blog entry called vibe coding and agentic engineering are getting closer than I’d like. so apparently he was just on someone else’s podcasts. Probably not as good as this one. don’t get you anyway. ⁓ but it, but it made him think about, just sort of like vibe coding in general and how,

Shimin (45:00) Simon, come on.

Rahul Yadav (45:03) and disapprove.

Dan (45:10) As soon as that term was coined, he immediately had sort of like a counter post about it where he was like, yeah, well, there’s also agentic engineering where you’re like being a software engineer and applying everything, you know, but then moving faster because you’ve got these tools. and he’s sort of coming to terms with the fact in this, this blog post, something that I feel like I’m also personally coming to terms with is that like,

He doesn’t call it review fatigue in this, it feels to me like review fatigue where there’s so much code being created now at such like crazy speeds that you just see so much code. So you see so much more code than you did. And reading code to me was always like an important and like large part of the job. But even with that said, it’s just like.

Rahul Yadav (45:48) you

Dan (45:52) It’s become sort of this blur, right? So through that lens, he’s starting to potentially ship things that he hasn’t himself read, or at least put up for review by humans, things that he hasn’t necessarily read. And he brings up a really interesting question that I found interesting enough to feel like it belongs in this, is…

So is that in fact any different than depending on a team of humans, right? So like, let’s say you’re the, I don’t know, front end team and you have to depend on the backend team to build you the APIs. There can be bugs in those APIs. They’re just humans created them, right? And you may or may not find those bugs until you exercise them. And the same is actually sort of true of like LLM created code, right? It’s like.

quality sort of happens the same way regardless of anything else. So is that okay or not okay? And you know, it doesn’t really go on to answer that question. I’m not sure we could either, but ⁓ it was pretty interesting to sort of think about it through that lens is like, is there a significant difference than that? And you might not dig into the code that had that bug until you ran into it, right? You might just accept that there is an API that I don’t know.

gives you coordinates for blue squares on the screen or something, right? And so you use it until you realize that it made a blue octagon and you’re like, well, what happened here?

Rahul Yadav (47:07) I think in this analogy, let’s say you’re one front-end engineer and a thousand backend engineers are writing code and you have to then build your whole front-end on top of it. Also flawed, but you won’t be able to read it. You won’t be able to find those bugs either. So I think the sheer overwhelming.

Shimin (47:07) Yeah, I

Dan (47:20) Mm-hmm.

Rahul Yadav (47:28) number of PRs and the lines of code that you have to review, that’s different here. And you’re not even like front and back and split, were at least consuming the API yourself, you were using the product yourself, you had certain expectations of the people, you knew the technical talent of the people.

here you might not and you could say they might be better but only until you get throttled to a legacy model because colossus one decided to take its computer away now that spacex has ipod or you know pick whatever so i think there’s a lot of this like unreliable

Dan (48:00) You

Rahul Yadav (48:07) party or black box that is doing these things and is doing it at such a scale that you cannot realistically verify everything and you cannot be bottling the bottleneck either because then you would become the problem.

before we would be like, yeah, we have fewer front-end engineers than back-end engineers or fewer back-end than front-end, whatever. But you could always rationalize that as like, that’s why we’re moving at X speed. Now it would be like, why aren’t you moving at X times 10 or X times 100 speed? So the pressure is on the human to be able to move as fast as possible. And that’s what gets us here.

Shimin (48:41) Yeah, the pressure is on the human and the responsibility is also on the human. Like if you had a backend team doing the thing, you can say, well, this doesn’t work. And some people like to put the responsibility on the other team more than others. But with AI agents.

Rahul Yadav (48:44) Yes.

Yeah.

Dan (48:53) What?

I’ve never seen that in my entire

Shimin (48:58) Never. With AI agents, this becomes more of a gray area because ultimately you are responsible for the AI agent’s work. There’s no other person to push it towards. also, I think part of the reason why Simon feels much more comfortable with AI generated code or vibe coded code now is it’s a trust issue. Like if the backend software engineering team has proven to be reliable over the last

Rahul Yadav (49:06) Yeah.

Shimin (49:24) six months with very little guidance needed, then you tend to trust them more. Even though in this case, it’s probably not correct to trust a thing that is not sentient. as a tool, yeah, I find myself trusting it more and more. These days I find myself using Claude Code with dangerously skipped permission. It’s just an alias for regular Claude Code. I just command R and do Claude and just

run it with dangerously scared permission, ⁓ unless I know the task I’m doing is high risk, right? So.

Dan (49:57) You know, I’ve

never once done that.

Shimin (49:58) Then just just just try it once. Just take one. Take one hit. Your life will never be the same. No, that’s not true. I admire you.

Dan (50:06) You’d think with all of these

computers behind me, I would find one that I would feel comfortable running that on, but never once done it.

Rahul Yadav (50:09) Peace.

Shimin (50:14) Right.

Rahul Yadav (50:14) ⁓ a great alias for that would be fuck it, ship it. It shouldn’t be this corporate speak dangerously skip permissions. It should be. That’s what you type in it. It spins up Claude. You know it has max permissions. You just let it rip.

Shimin (50:29) And they do call it… Sorry.

Dan (50:29) Or what was the Facebook

one was like, you’ll be fired if you use this function or something. I forget. It’s pretty good. Yeah.

Rahul Yadav (50:35) really?

Shimin (50:38) Yeah, and he does talk about the normalization of deviance here, which is, you know, it’s really hard to keep yourself on guard at all times. If the tool is mostly good, it’s like the self-driving thing. Like it’s hard to let the car take the wheel 99.9 % of the time, but get fully back into it at the 0.01 % of time when you actually need human supervision.

Rahul Yadav (51:04) And our brains and body, like humans, want to conserve energy. This is less energy intensive than reading through a lot of stuff that is cognitively, you know, very energy intensive. Yeah. So like we’re always going to do the easier thing. It’s just, that’s how things work.

Dan (51:16) Yeah, expensive to parse, yeah.

Not always. I found one situation where I raced Claude on purpose, which was we had a very tight deadline on the production bug. I was like, you know, I can’t depend on because of previous experiences that we’ve talked about with last few podcast. was like, I can’t solely depend on this. So I prompted Claude, I let it go do its thing. And then I jumped into the code myself and started.

hammering at the same bug and I beat Claude by about 35 seconds. I was very excited by that. Probably under under two minutes total for either of us. I mean it was it wasn’t hard to figure out what was going on once I understood it.

Shimin (51:55) How long did the whole thing take? Like 35 seconds out of… Okay, well that’s pretty impressive.

Rahul Yadav (51:55) Nope.

In Claude’s defense,

it didn’t have access to Colossus. Try now.

Shimin (52:11) Yeah,

I was thinking about the fact that, you know, Simon was talking about, Vibe engineering not long ago. And now even he is, and of course he is a very, very senior and experienced software developer. even he is going completely let Claude take the wheel. ⁓ it’s almost Shakespearean. It reminds me of the, ⁓ the quote from the Tempest.

Like, go starts with like, wonder how many Godly creatures are there here? How beauticians agents are? brave new world that has such AIs in it.

Rahul Yadav (52:41) Shimin’s so much more velvete than you and I did. We’re just fucking idiots looking at tweets here and thinking it’s poetry. I don’t read this, whatever this was.

Dan (52:48) True. I can’t even read. mean, I just, you know.

Shimin (52:49) Rahul you read all the time.

Dan (52:54) I Claude do it for me.

Shimin (52:56) Alright, anything else you would like to add to the Simon’s article

Rahul Yadav (52:59) ⁓ yes,

not to this article, hopefully I’ll get to do the review this in a future podcast. At the AI engineer Europe conference, think, whatever the one that happened recently, Dexter Horthy, who’s also recently with the whole like AI thing has been pretty…

Dan (53:08) Rahul’s book club.

Shimin (53:15) Yep. Yep.

Rahul Yadav (53:23) you know, talks a lot about these things and all that. Was a big proponent of agentic engineering, wipe coding such as agentic engineering. Went on to that ⁓ conference and then during his talk admitted that yes, he was wrong. They tried it. It didn’t work out because at the end of the day, you’re reading specs, then the end result is still making mistakes. So might as well just read the PRs. And they kind of came this full circle on the

dark software factory and they’re like, don’t try this. So that was an interesting data point on someone who’s has tried it ⁓ in earnest was a big proponent of it. But to his credit has changed his mind after seeing currently the some of the things that happened and why it’s so better to for everybody to just review the code. So it was a very good talk would be worth discussing in a future podcast.

Shimin (53:53) Mmm.

Yeah, I have

two other YouTube videos of two other talks from that conference on another window that I haven’t gotten to. So maybe, maybe it is, it will be fun to talk about. Maybe it’d be fun to go to the next one.

Rahul Yadav (54:25) should do a roundup. Yeah.

you can go represent our podcast at their podcast.

Dan (54:32) I mean, I think we’ll all

be representing Google Enterprise. What?

Rahul Yadav (54:37) AutoMe agentic platform

Get you some no money off discounts ran out last week

Dan (54:41) Hahaha

Shimin (54:44) Yeah, well, my post for this week from James Shore titled, You’ll need ⁓ AI that reduce maintenance costs, ties in nicely with the previous article. James makes the point here that just because you are able to create more code now, doesn’t mean your, bottleneck

Dan (54:44) Yeah, true.

Shimin (55:06) of maintaining existing code and existing feature has changed. this is a very simple, he has a very simplistic graph here where it states that if you double your productivity when it comes to generating new code, but hours required to maintain your entire code base doesn’t change, then once you adopt agentic ⁓ coding or vibe engineering, the amount of time you spend maintaining code would hit

50 % within like 10, 12 months. And this really spoke to me because I think I talk a lot about, you know, having to maintain the Pi agent, my skills in my Pi agent, various other skills for Claude code and various side projects. The amount of just like random chores I have to do, like really has increased dramatically over the last…

say four five months. It’s that old saying. AI allows you to ship all your side project ideas 10 times faster so you can abandon your side projects 10 times faster. That’s kind of been true. And so what he is saying that in order for AI agents to give us true productivity increase,

the amount of time that it takes to maintain your code base must decrease by an inverse of the productivity increase. So if it makes it three times as a product, you must be able to maintain the code base in one third of time. Because otherwise the shift of, you know, amount of time doing maintenance just increases exponentially. And also ties into our conversation about how code is free like a puppy, right?

How many puppies can you get until you spend your entire day cleaning after them and taking them on walks and stuff? And that’s not really the kind of future I would want to live in.

Dan (56:53) Two is plenty.

Shimin (56:54) to you, but I don’t have to now. That is probably more than enough. Yeah. Um, and he doesn’t really talk about what to, you know, what we can do about it. Uh, but I do think if this hypothesis is true and my gut says this is true, we are going to see some dramatic, um, velocity decrease out of

pretty much all software companies within the next 6 to 12 months. So we’ll come back in 12 months and see if the industry is trying to grapple with a maintainability issue.

Dan (57:30) Just look at GitHub web hooks. Uptime.

Shimin (57:33) Right, that’s only been around

for so long. Like it’s only gonna get worse. I guess it’s better to say. Yeah.

Dan (57:37) Yeah.

Start counting the number of days it’s up instead of down, I feel like.

Rahul Yadav (57:44) If you have long-running agents, would they not be able to continuously do maintenance?

Dan (57:52) I’ve sort of been thinking that that’s probably the best use case for them. But I think it depends on what you define as maintenance, right? Because I don’t think you can, least not yet today anyway, point a long running agent at it and be like, make meaningful refactors to this code base that make it easier to maintain over time. Right? Probably not going to do so good at that, but update all my, you know.

Rahul Yadav (57:57) Yeah, like why can’t you throw money at the problem?

Uh-huh.

Dan (58:16) dependencies that were just hacked in a supply chain attack, great, like probably fine for that kind of stuff, you know? Yeah. my goodness. And expect more of that too, right? Cause those type of attacks are gonna, they’ve already gotten cheaper and easier and we’ll continue to do so.

Shimin (58:21) I’m glad Tanstack is making a presence here.

Rahul Yadav (58:32) Yeah.

Shimin (58:34) Yeah. And Rahul, to your point, yeah, I was thinking about the same thing. Like what can AI do to help us reduce maintenance, right? You got to have, they can easily do better documentation, more interaction. You’re going to have a chat agent for your documentation, for your code base. That all in theory reduces onboarding and maintenance time. We have talked about OpenAI’s the slop garbage collection step that they have in their AI agent harness, which seems like a natural fit for of this.

Rahul Yadav (58:57) Hmm.

Shimin (59:01) And of course you can do a full on AI based refactoring, but until the whole process is completely automated, whatever manual part remains will become the bottleneck. And it’s really this brave new world where we trust the AI code, which like almost 85, 90 % of time, we do not trust the refactoring as much, maybe only 60, 70 % of time.

Rahul Yadav (59:15) Hmm.

Shimin (59:28) And then we spent all of our days. Yeah.

Dan (59:29) I’m not saying it can’t. Yeah, I’m not saying it

can’t refactor. I’m just saying that like it might not, at least in my experience, won’t necessarily pick the right abstractions, right? Or know when an abstraction is useful versus when it’s not. I think we still haven’t quite gotten there with models. And then so like the pattern I see a lot is it doubles down on either the poor abstraction it’s taken or the like sort of like that over editing thing that we’re talking about.

Rahul Yadav (59:40) Yeah.

Shimin (59:43) Yeah, I’m… Yeah.

Dan (59:54) or compounds problems because it refuses to give up a poor abstraction. Like a very simplistic example of that. I’ve been working on automating all the lights in my house with Home Assistant and like created a factory method that returns the like standard light automation template that I’m using. And I just sort of like took it at face value. yeah, that’s fine. Like it’s a function that returns another function to you all the time. And then I looked at it.

The function that returned the other function had two arguments difference.

And I’m like, why did you do that? Why didn’t I catch it? You know, it’s just like there was zero point to have a, an entire extra method returned. Like, so yeah, kind of that kind of thing. I, yeah, clearly it was bad prompting.

Shimin (1:00:35) Just get good. Just get good at prompting then. ⁓

⁓ I, you know, we probably can remediate some of that around the process, the, having better context management for what good refactoring looks like all of that. But at least as of today, we’re not there yet. I’m not saying we don’t trust it, but it’s just like, we trust, I trust us significantly less than I trust it with like get something done. Yeah.

Dan (1:01:00) and we kill the beast.

Shimin (1:01:01) We’ll find out. All right, if there’s nothing else.

Dan (1:01:05) Move on to our second nuclear exchange of the evening.

Shimin (1:01:08) Yeah, let’s go for it.

Dan (1:01:09) First

being Shimin having ⁓ fired nukes at us. The second being two minutes to midnight where we talk about how close we are to the collapse, non-collapse, whatever. Are we even in an AI bubble anymore? Bubble bursting through the lens of the Atomic Bureau of Scientists, is that right?

Shimin (1:01:15) Existential dread thoughts, yes.

Dan (1:01:31) 50s. So as a quick reminder, as we get to midnight, that means that the bubble is bursting. Oracle has collapsed. I think that’s what Shimin thinks is the leader of it. And the further we get away from midnight, the less likely we are or less close we are to it.

Shimin (1:01:40) Mm-hmm.

Yep.

And we are at four minutes as of last week. So I mentioned this is going to be a Elon Heavy episode. you know, I was an Oracle man all these past month and a half. Now I, yeah, now I may be a SpaceX man. Here I have a Wall Street Journal article titled Elon Musk’s Grok is losing ground in the AI race.

Dan (1:01:50) week.

Pun intended.

Shimin (1:02:11) The Wall Street Journal article has some very interesting numbers. The total number of paid Grok app downloads topped at around 20 million in January of this year. And as of April, it sits squarely at around 8 million downloads. So not a good sign for our most free speech aware.

a large language model out there on the market. And data is coming from App Magic. And in another survey of 260,000 US consumers who use AI, the percent of respondents who said they pay for GROK remained mostly flat at 0.174%, which is almost exactly the same as it was a year ago.

Compared to OpenAI where they pay more than 6 % of respondents said they pay for chat GPT. These are not good numbers. So Grok is about 1/ 30th the total user base as chat GPT. And if there’s also quote from Ben Pouladian an engineer and tech investor based in LA.

quote saying, OpenAI is Coke, Anthropic is Pepsi, and Grok is RC Cola. That’s about sums up how I feel. And I don’t think RC Cola is necessarily long for this world.

Rahul Yadav (1:03:37) Is Anthropic not coke?

Dan (1:03:38) just rank on this weekend,

weirdly.

Rahul Yadav (1:03:40) I would want, Anthropic should be Coke

Shimin (1:03:43) That only happened in the last couple of weeks.

Rahul Yadav (1:03:45) I guess so. Open AI does have a larger distribution than Anthropic in consumers, I guess.

Dan (1:03:52) Maybe the fact that I drank an RC is what inspired me to make a XAI account. Wild.

Shimin (1:03:54) Hahaha

Rahul Yadav (1:03:57) Anthropic is

NA beer, only the cool kids drink it.

Shimin (1:04:04) As

an NA beer drinker, I resent that. ⁓ Lastly, course, Grok is also not doing well in the enterprise market. Can you guys imagine if you’re reading a job description and they said we are a…

Rahul Yadav (1:04:07) Hahaha.

Yeah. ⁓

Shimin (1:04:20) space X X AI only shop that you cannot bring your own tokens from Claude Code or Openair. Like I would just probably not work for them.

Dan (1:04:29) It depends. Like, were you writing maximally true seeking code or, you know, just like boring normal applications.

Shimin (1:04:36) yes, I just need to be a better truth seeker. I agree.

Rahul Yadav (1:04:40) You ask it whatever, it’s leaking secrets left and right because it just wants to tell the truth to everybody.

Shimin (1:04:46) ⁓

then of course, ⁓ you know, in January, ⁓ Grok got all those additional downloads cause it was doing the nudification of a picture thing. Remember that? And, and there were, yeah, other, other things of a adult nature. And even adult content cannot save Grok here. this is the one time in the internet history where, ⁓

porn to not spread a piece of technology to the general populace.

Rahul Yadav (1:05:14) Thank

Shimin (1:05:15) All right, that is my article. Next up, Dan.

Dan (1:05:18) Yeah, so in the, this is a little bit more speculatory, but ⁓ supposedly Anthropic has agreed to pay Google 200 billion for it ships and Claude access. So they’ve supposedly, I don’t think there’s been any formal acknowledgement of this, coming from the information they’ve reported that deal has been made and

that that puts the revenue backlog at two trillion across Amazon, Google and Microsoft and Oracle.

Rahul Yadav (1:05:46) Revenue backlog means they owe this much in commitments.

Dan (1:05:51) Yeah, I think if like these deals, I mean, that’s why I say, you know, sort of like allegedly inked, right? Cause we’ve sort of talked about some of these in the past and they’ve either fallen through or like not been as real of a deal as we’ve talked about. So who knows? But that was the part that stood out to me was the revenue backlog of $2 trillion. I mean, we’re approaching like,

Shimin (1:05:52) Mm-hmm.

Dan (1:06:13) 1990s national debt levels at that point, you know, like.

Shimin (1:06:16) Hahaha

Rahul Yadav (1:06:17) That’s what I’m going to start telling any like credit card people or anyone I owe money to. It’s revenue backlog. It’s not debt. What are you asking me for? It’s just in the backlog. We’ll get to it.

Dan (1:06:26) Ha

Yeah, we’ll get to it.

Shimin (1:06:33) two trillion dollars. Is that even a lot of money these days? Who can tell? And Rahul, you have an article from Ars Technica.

Rahul Yadav (1:06:40) Yeah, Silicon Valley is putting $200 million into AI data centers that will float in the ocean. This is by Jeremy Su at Ars Tecnica The company is called Panthalassa, which is aiming to test these AI computing nodes in the Pacific in 2026.

Honestly, when I read this, was like, $200 million. What are we even talking about? Is this like someone’s, you know, project they’re doing on the side, given how much money is getting thrown around these days. But it is a smart move. Land is limited. We have more water than land. And you bypass all these building permits and everything. There are going to be challenges because you know,

Dan (1:07:07) Jump change, yeah, side project.

Rahul Yadav (1:07:24) Water is more fluid than land And you deal with the salinity and all these things also, I think they’re Doing satellite links. You’re gonna be limited in how much bandwidth you would have and then Yeah, like all sorts of crazy shit that happens and And and also

Dan (1:07:40) Not to mention storms.

Why is Claude slow today? hurricane.

Shimin (1:07:48) I was thinking the same thing.

Dan (1:07:50) You

Rahul Yadav (1:07:51) I would almost treat any data center that I cannot police in a different category where it cannot have anything critical. Because if someone can attack it, someone can take it over or hack it easily or all those things. Pirates can go pirate the thing because they go.

Dan (1:08:09) A new age of GPU piracy.

Shimin (1:08:11) It’s water world water world is back

Rahul Yadav (1:08:14) they go where the money is, you know? And so, every time you put this there, you’re almost like staking a claim on some like water territory you have to police and there’s all sorts of security implications to it, so.

Dan (1:08:18) I can’t wait to read that headline.

mysterious

entirely blacked out freighter ship walks away with 200 gigawatts of compute.

Rahul Yadav (1:08:32) yeah. I am the captain now guy just looks

at the Claude or something.

Dan (1:08:41) Now we know how combining this with the previous article, know how XAI is going to come back. They’re just going to steal this water compute and the.

Rahul Yadav (1:08:42) So yeah, I was…

Shimin (1:08:42) That’s very exciting. How do I sign up?

Rahul Yadav (1:08:50) Yeah, throw whatever in the… yeah.

Shimin (1:08:51) via rockets.

Yeah, I like this idea. I really actually quite like this idea.

Rahul Yadav (1:08:56) Yeah.

Dan (1:08:58) It’s more realistic than in space. Yeah. Because it’s like we have the lift capability to do this, you know? But.

Rahul Yadav (1:08:59) I mean, it’s better than space. Yes.

Shimin (1:09:05) Yeah.

Rahul Yadav (1:09:06) And

even if it fails and you get some like technical breakthroughs that help with other things, great, you know. It’s $200 million. This is nothing. ⁓ We got two trillion of revenue backlog over there. So this is like a few hours of Claude usage.

Dan (1:09:14) Yeah, but the thing that… That’s true. The thing that worries…

Shimin (1:09:15) Yeah.

Dan (1:09:21) Where’s me?

But nobody ever thinks about the second order of facts, right? So it’s like, yeah, you’re getting this like quote unquote free cooling, but it’s also at the cost of heating the ocean. And like, if they deploy these things at any kind of scale, all of a sudden that’s going to like increase ocean heating, which is like already hitting tipping points where they’re concerned about like 50 % chance the Atlantic current stops, which is like, whoa. So.

Rahul Yadav (1:09:32) Yeah.

Shimin (1:09:46) Are

they not capturing wave-based energy generation? Are they not doing that kind of renewable generation? Are they just like, it’s only a cooling advantage?

Rahul Yadav (1:09:55) They’re doing that, but

no, both. For energy, they’re using the wave energy, and then for cooling, they’re going to use the ocean as a sink is the idea.

Shimin (1:10:05) That’s pretty sweet in my opinion. And then like, laying data communication cables in the ocean under the ocean as a solve problem. we know what’s not solved piracy. So, ⁓ look out for that headline guys.

Rahul Yadav (1:10:18) AI pirates. Johnny Depp comes back 20 years later. He’s an AI pirate now.

Dan (1:10:18) soon, maybe next year.

Shimin (1:10:20) Yeah. All right. All that.

Dan (1:10:26) GPU piracy.

Shimin (1:10:26) All that said,

how do we feel about the AI 2 minutes to midnight this week?

Rahul Yadav (1:10:31) Same.

Dan (1:10:32) I’m torn because $2 trillion backlog doesn’t seem too feasible to me. But the fact that we’re still talking about $200 billion deals means that the money hasn’t dried up since last week either.

Shimin (1:10:45) No,

I’m yeah, maybe this is a false sense of security, but I agree with you. feel like the money spigot is still pretty much full on and no major red flags, no companies going like anthropics solved their compute crunch, you know, for Christ’s sake. So, I feel pretty good about going back like a minute, maybe even two.

Dan (1:10:55) Mm-hmm.

Shimin (1:11:08) I feel about a clock the way I feel about Claude code. There’s been no issues for so long. I’ve been lulled into to a false sense of security.

Dan (1:11:08) It’s gonna be like…

two hours to midnight.

Shimin (1:11:17) Maybe not two hours.

Dan (1:11:18) Yeah.

So six.

Shimin (1:11:19) Six? Yeah, let’s do it. Big swings. Sold.

Dan (1:11:22) Going once, twice. Any ravel

objections?

Rahul Yadav (1:11:26) Nah. Do whatever with this one.

Dan (1:11:27) Alright. Do whatever. Okay, with that resounding…

Shimin (1:11:28) All right, we’re still.

Well, the class the clock has been set and that’s the show So thank you for joining us for our session this week If you like to show if you learn something new, please share the show with a friend You can also leave us a review on Apple podcast or Spotify It helps others to discover to show and we really appreciate it if you have a segment idea question for us or a topic you want us to cover or if you like to get on the show as long as you’re not doing it through an intermediary and

Dan (1:11:35) Yes.

Shimin (1:11:59) You have an actual internet blog with interesting articles that we can discuss. Shoot us an email.

Rahul Yadav (1:12:05) And especially if

you’re pro-grok we would love to hear from someone who disagrees with us.

Shimin (1:12:08) ⁓

Yes, we could use some First Amendment warriors on the show. Just kidding. But maybe not. Maybe it’ll be fun. Shoot us an email at humans at adipod.ai. We’d like to hear from you. You can find the full show notes, transcripts, and everything else mentioned today at www.adipod.ai. Thank you again for listening, and we’ll catch you next week. Bye.