Episode 22 · April 24, 2026

Is Claude Opus 4.7 Mythos Distilled, Running Qwen 3.6 Locally, and the AI-On-AI Arena

Claude Opus 4.7, mythos slice, Qwen 3.6 35B, A3B, MoE, llama.cpp, Unsloth, GGUF, Pi Agent, Pelican benchmark, Simon Willison, cal.com, closed source, AI Security Institute, mythos, Jesse Vincent, rules and gates, hooks, superpowers, vibe coding, HIPAA, DSGVO, Kyle Kingsbury, Jepsen, future of lies, AI on AI arena, delusion index, Grok, Gemini 3.1, GPT-5.4, Stargate, Epoch AI, data center construction, Paul Graham, railroad bubble, Anthropic, two minutes to midnight

Listen on

Apple Podcasts Spotify Overcast Pocket Casts Amazon Music

This week Shimin, Dan, and Rahul ask whether Claude Opus 4.7 is a distilled mythos slice and watch Simon Willison’s Pelican-on-a-bicycle benchmark break for the first time as Qwen draws a better bird, set up Alibaba’s open-source Qwen 3.6 35B A3B locally at 90-95 tokens per second as Shimin’s new Pi Agent driver after the Anthropic OAuth revocation, run an AI-on-AI Arena where 11 frontier models grade themselves on AI’s second- and third-order societal impact (Opus 4.7 wins, Gemini 3.1 finishes last), cover cal.com going closed source over mythos-class security risk, Jesse Vincent’s rules-and-gates technique for stopping agents from weaseling out of preconditions, a HIPAA-violating German vibe-coded patient portal as proof “the bullshit future is already here,” Kyle Kingsbury’s “the future of everything is lies” essay, Dan’s confession about cognitive debt in the wild, and Paul Graham’s chart pinning AI capex at ~1% of GDP versus US railroads peaking at ~10%.

Takeaways

Qwen 3.6 35B A3B running locally at 90-95 tok/sec via llama.cpp + Unsloth GGUF — Shimin’s new Pi Agent driver after Anthropic revoked third-party OAuth on Claude Code subscriptions
Simon Willison’s Pelican-on-a-bicycle benchmark broke for the first time — Qwen 3.6 drew a better pelican than Opus 4.7, the first time the stronger-model-better-bird correlation has been broken
Opus 4.7’s token-burn reputation may be overblown — its stricter instruction-following used about 1/8 the reasoning tokens of 4.6 on Shimin’s SVG tests (35¢ vs $2 per run), partly thanks to a new tokenizer that 3x-bloats English
UK AI Security Institute confirmed mythos completed a 32-step Fortune-500-style network attack 3 out of 10 times at $12,000 per attempt ($125k for all 10 runs)
Drew Breunig’s three-phase dev/review/hardening cycle suggests software gets expensive again on the hardening side as mythos-class agents make open repos easier to exploit; cal.com going closed source may be the leading edge of that trend
Jesse Vincent’s “rules and gates”: reformulate optional preferences as directional preconditions (“X → Y → then act”) so agents stop weaseling out via rationalization; hooks differ by being deterministic and programmatic
A German vibe-coded patient portal with database credentials inlined into client-side HTML — HIPAA/DSGVO nightmare and proof “the bullshit future is already here”
Dan’s confession: couldn’t solve a prod bug a pre-LLM colleague fixed in 5 minutes; his instinct was to double down on tooling rather than slow down — the cognitive-debt failure mode in the wild
Paul Graham’s log-scale chart: AI capex is ~1% of GDP vs US railroads peaking at ~10% — there’s “another 40 years and 9 points of GDP to engulf” before this matches that bubble’s scale

Resources Mentioned

Chapters

(00:00) - Cold open and intros
(02:14) - Qwen 3.6 35B A3B running locally
(07:51) - Claude Opus 4.7: the Mythos slice
(12:25) - Simon Willison’s Pelican benchmark breaks
(17:30) - cal.com goes closed source
(19:00) - $12,000 per attempt: Mythos vs Fortune 500
(23:30) - Technique Corner: Rules and Gates (Jesse Vincent)
(27:09) - Vibe coding horror: the HIPAA patient portal
(32:55) - Kyle Kingsbury: the future of everything is lies
(38:26) - Dan’s confession: cognitive debt in the wild
(42:56) - Vibe & Tell: AI-on-AI Arena
(49:32) - Two Minutes to Midnight: data centers and railroads
(59:31) - Closing the clock at 3:30

Transcript

Show full transcript

Shimin (00:15) Hello and welcome back to Artificial Developer Intelligence, a weekly conversation show about AI and software development. We go through hundreds of links and dozens of newsletters on AI every week, so you don’t have to. I am Shimin Zhang and with me today are my co-hosts. Dan, much of the bullshit future is already here, Lasky And Rahul, he wants AI to be a civilizational destiny story, which makes him energetic.

but weakly calibrated. Yadav. How are we doing today, gents?

Dan (00:45) I don’t think I could. Yeah, he doesn’t always get me with them, but he really…

Rahul Yadav (00:47) You got Dan with that one.

Hehehehehe

Dan (00:53) I’m doing well, thank you for asking.

Ugh.

Rahul Yadav (00:56) It’s

finally getting nice in the Pacific Northwestern. We’re almost there.

Dan (01:00) Cheers.

Shimin (01:01) I know.

Yeah, that’s like 80 degrees in my room right now because I have the window, which the skylight, which you can see via video. And that’s steamy. But yes, loving this weather. On today’s show, as per usual, we are going to start with the news threat mill where we’re to talk about Alibaba’s new Qwen 3.635B model, as well as the

Newly released Claude Code Opus 4.7. And lastly, some AI security news following our mytho segment last week about cal.com going closed source.

Dan (01:35) Then we have a quick hop into the technique corner where we’re going to be talking about rules and gates. And we’ll find out what that means then.

Shimin (01:42) Yep. Followed by post-processing where we’re going to talk about some AI vibe coding horror story and the future of everything is lies.

Dan (01:50) Then next up we find out what Shimin has been doing with his apparently we just found out $200 a month Claude subscription on Vibe n Tell

Shimin (02:00) not even, didn’t even use Cluade Code for the whole thing. But lastly, we’re going to do two minutes to midnight where everyone will talk about where we are in the AI bubble cycle. Is there a bubble? Maybe there is no bubble. Anywho, we’ll find out.

Dan (02:13) Let’s get at it.

Shimin (02:14) All right, first up on the 14th, which is last week, Alibaba released a Qwen 3.6 35B A3B. I’m not really sure what the A3B stands for. It’s a 35 billion parameter mixture of experts model. It is the open source version of the Qwen 3.6 Pro.

Dan (02:21) A3B.

Shimin (02:34) or is it Max, which was released two weeks ago. And that one was closed sourced. So this is the smaller open way.

kind of everyday usage model that the Qwen team has released. Usually we look at these models and we look at the benchmarks, say this looks pretty good, and we move on. One thing to know about this model, is multimodal.

Dan (02:55) You might move

on, I go run it on my framework desktop.

Shimin (02:59) Well, what I was going to say is from the benchmark, it looks to be on the Gemma 4 level and it’s also a Gemma 4 size model, especially given that it’s a mixture of experts. So at any given time, only 3 billion parameters are actually active. So what I actually got a chance to do is, you know, last week I was talking about how the

Dan (03:01) As you finally ran one, finally.

Shimin (03:25) Pi Agent’s Anthropic OAuth connection no longer works. And like, that really sucks, because I had to migrate over to use Claude Code for everything. And this week, I bit the bullet and set up the local Qwen 36 model as my main driver for my Pi Agent. ⁓

Dan (03:42) How did

you get tool calling to work with Pi? Because I played around with that and couldn’t get tool calling to work. What did you get hosted on?

Shimin (03:50) It is a llama CPP with the unslov version of the model itself. ⁓ Yeah. I didn’t, I didn’t find the issues with tool calling. ⁓ What was especially useful for me is, know, look, I got gut check. I feel like this is a sonnet level model. It was able to read all the skills, understand all the skills, make skill configurations to update yourself to you, to work with Qwen.

Dan (03:57) Yeah, like the GGF clients.

Hmm, interesting.

Shimin (04:17) ⁓ And I believe I was talking about last week how, you know, if you still miss your Opus four seven level kind of large model, you can’t just get your pi agent to call claude code and send back the result. So I asked Qwen three six, like, Hey, install cloud code, kind of do some Docker configuration to get the, ⁓ my native claude code credentials mounted. But after.

That’s all set up. asked Quinn36 to, you know, get me the weather in Louis town, Montana using Claude Code and it just worked. It was very successful.

Dan (04:52) I think Qwen

could have handled the weather by itself.

Shimin (04:55) You know, it’s a Chinese model. Sometimes they may need help for US data. don’t know. So ⁓ overall, I’m fairly impressed with the model. I think this will be my new daily orchestrator model bearing kind of any other latest model development.

Dan (04:59) Yeah, okay.

I know we’re way off script here, I, there is no script. I actually just got over the weekend got.

Jumpjump for running locally, cause I was curious about it. Like the biggest size, I think it’s only 36B, right? ⁓ And I was actually really surprised at how slow it is, which makes me think I’m doing something wrong. Like I was barely pushing 50 tokens on ⁓ inference out of it, which doesn’t seem right. So there must be something wrong about the config that I used.

Shimin (05:20) Mm-hmm.

Yep. Yep.

Hmm.

50 tokens is not terrible. I was getting…

Dan (05:42) But it’s

a tiny model. Like I’ve run like 108 billion parameter ones on that same setup and gotten like, you know, 10 to 20 tokens. Something like that, yeah.

Shimin (05:53) Interesting. Yeah.

It would be great to do a deep dive sometime about token out plus B. Qwen was giving me 90, 95 tokens per second, which I think is pretty decent.

Dan (06:03) Well,

you’ve got a way faster actual GPU than I do. I just have a boatload of RAM sitting on an integrated GPU.

Shimin (06:09) Yeah, that too.

The entire model did not fit in my GPU either. I was offloading a lot of it into regular RAM.

Dan (06:14) Hmm. Hmm.

Now that we’ve lost Rahul, let’s keep going. It’s like you guys are nerds. Why am I on this podcast anyway?

Rahul Yadav (06:21) No,

so part of what Dan was saying, like, I’m used to this news every week, so, you know, like, fine.

But then one part of me was like at what point do these open sources make sense because you have to like you know versions help you identify different things but at what point does opus 4.7 or gpt 5. whatever just becomes opus and gpt and you don’t have versions it just is another thing that you expect to get better over time and if it degrades if you’re having compute struggles then it’s also the same thing and you’re like it’s the thing

It’s got no other, you know, there’s no major minor versions attached to it. We expect the thing to do the thing.

Dan (07:01) Imagine the…

Imagine

the internet ballyhooing if people didn’t have a version number to tie it to. I’m done with XYZ model. They’ve just, I don’t know what they did to it, but it’s…

Shimin (07:10) Mm-hmm. Yep.

Rahul Yadav (07:12) Yeah,

but especially for…

Shimin (07:18) But, but Dan, this is where you were saying last week about seizing the means of token production or token generation. Like I like having a weaker model that I know is on my computer and will never get degraded.

Dan (07:22) You

Rahul Yadav (07:23) Yeah.

Dan (07:30) Yeah, it’s true.

Rahul Yadav (07:30) Yeah,

yeah.

Shimin (07:31) That said, I did all that work just to hear rumors today that Anthropic is potentially allowing its Claude Code usage on third-party harnesses again. I potentially did all this work for no reason, but that’s for next week’s use.

Dan (07:43) really?

Well, there also may

be some other news coming out of them, which if that’s right, you’re going to hear a Dan’s rant about that too.

Shimin (07:51) Yes.

Shall we move on to potentially the biggest modern news of this past few weeks?

Dan (07:58) Yeah, so

speaking of anthropic, they have dropped a what they’re calling a sort of limited slice of mythos, essentially. So they’ve supposedly dumbed it down enough that it’s ⁓ safe enough to release to the masses. So we now have Claude Opus 4.7.

Shimin (08:08) Mm-hmm.

Dan (08:19) Again, I think rather than just going through the benchmarks and everything, you can read those yourselves and whatever. It’d be more interesting to talk a little bit about like sort of like how things are being perceived with this. And like what I’ve seen so far is a lot of people are complaining quite a bit about how fast it burns through your ⁓ token usage, which is kind of interesting.

Shimin (08:38) Mm-hmm.

Dan (08:41) Mostly because it’s like, I didn’t realize that happens every time, but like, this is the first time where I’ve seen people actually be a little, even though I’ve also seen descriptions of it being almost like step change again, sort of like four, six or four five really was in the first place. Um, seeing people going like, I’m not sure it’s worth it. Like four, six was actually starting to be good enough, which to me, that’s the funny thing. That’s like, we’re kind of talking about, you know,

Wow, we sort of reached an inflection point where like, you know, you had good enough and you’re fine with using good enough all day versus like super smart for a short amount of time. So, ⁓

Shimin (09:17) Right. ⁓

Well, the token, the token thing is cause they’re using a new embedding. They’re using the Mythos embedding for their tokens. So it’s naturally going to bloat

Dan (09:23) Yep.

Yep. Yeah. I saw there was a whole article that did a deep dive on just the new tokenizer and it’s like something like a three X difference for just plain English, like more tokens. But, uh, that same article theorized a little bit about, um, it’d also be maybe going for a deep dive at some point that that’s perhaps why, like one of the other things they advertise it is that it’s much more precise in terms of instruction following than previous models.

Shimin (09:34) Wow, okay.

Yeah. Yeah.

Mm-hmm.

Dan (09:53) And they were theorizing that perhaps that was due to the new tokenizer and was sort of like a requirement in order to get that precision out of it, which is kind of interesting. What else is new? Better vision capabilities we’re talking about. all of a sudden, I mean, we got the new release of the Anthropic Design Tool. that’s…

fascinating because like sort of historically they’ve always lagged behind Gemini a little bit. yeah, so 4.7 can, I haven’t played around with that myself yet, but supposedly is better at understanding pictures and can quote unquote see in a higher resolution because of the.

Shimin (10:14) Hmm.

Dan (10:28) when you took Isaac.

Shimin (10:28) Yeah, I hear the same rumors that this is like a distillation of the mythos model in some way. And, which makes sense. Cause if you look at the actual system card, there are a ton of references to the mythos preview when it comes to like alignment risks and just kind of shortcomings of the model, which makes no sense. Cause this is a, you know, opus four seven document. Why are you talking about alignment issues with mythos? If

Dan (10:35) Mm-hmm.

Yep.

Shimin (10:54) This isn’t a baby version of Mythos in some way. Have you got a chance to use it? It’s been out for about a week now.

Dan (10:57) Mm-hmm.

I just realized today that it was turned on on my expensive account. So I’ve used it for about three minutes, which is not enough for me to personally draw any conclusions from.

Shimin (11:11) Yeah. ⁓

my experience with four seven has been at first I was like, it’s completing everything so quickly. This must be amazing. And then I realized it was doing everything super quick because it wasn’t actually, you know, doing the usual tool call cycles. Yeah. Yeah. It was making a lot more assumptions. And then the first time I caught it hallucinating.

Dan (11:29) interesting. ⁓

Shimin (11:36) It was like a really silly hallucination. was like it counted something. It made up another time I had corresponded with someone else. It said it happened three times as supposed to two. And I was like, this is like a really amateur level of hallucination that Opus four six would not have made. I, yeah, it started off on a, on a wrong foot with me. And so when I also read the same, like folks talking about, you know,

Opus four seven being the best model ever in their own personal benchmarks. I wonder if that is related to the instruction following. In fact, I wrote about this, which we’ll talk about later.

Dan (12:09) Yeah, I’m not sure. I haven’t used it enough to personally make opinions yet, but let’s talk next week and I’ll probably have formed some.

Shimin (12:17) should we talk about the-

Simon Willisons Pelican on a bicycle benchmark.

Dan (12:25) That was pretty funny. Yeah. So he stacked up, what your, your new Qwen model, right against, Opus four seven and, in his expert opinion, Qwen made a better looking Pelican on a bicycle than Opus.

Shimin (12:30) Yep.

Well, I think everybody would agree that the bicycle frame is off in the 4.7 version.

Dan (12:48) It’s true. And also he’s looking backwards, which is a very dangerous way to ride a bike at best.

Shimin (12:51) Yeah.

Maybe not if you were a bird.

Dan (12:55) I mean, maybe he’s just looking at the sun, but.

Shimin (12:58) Yeah. And this was really shocking, right? Because he mentioned that this is the first time this correspondence, this correlation between stronger models and better Pelican SVG drawing was broken.

Dan (13:11) I like how he’s also done flamingo Go on a unicycle. Pretty great.

Shimin (13:16) Yeah. So I ended up doing some follow-up experiments based on that benchmark because it was so shocking. I, you know, use the same prompt. I didn’t use my local Qwen three six, but I use the Qwen three six from open router, the more powerful, I think. So, I don’t know a couple of hundred billion, maybe one trillion parameters version and got the same initial results.

But then one thing that’s really curious about the four seven is even with thinking set to max, it only used 20 reasoning tokens for this SVG task. So ran another experiment this time. It was just a try hard prompt, right? Generate the same.

Dan (13:58) What was

the harness he used? Was it Claude Code

Shimin (14:03) No, it was a, ⁓ Vibe code that harness that calls API calls with no, with no system prompt at all. So this is all raw performance. I asked the models to try a little harder and think harder and four seven still did fairly terribly. But again, both the completion token count and the reasoning token count was really low. So, so I figured maybe like you were, were saying

Dan (14:05) Just API call. ⁓ okay.

Mm-hmm.

Shimin (14:26) earlier, maybe it’s just really good at following directions. And if you give it a crappy direction, this can give you a crappy output. So I asked 47 to expand the prompt for me and it created a whole paragraph or multiple paragraphs. And I pumped it into the models again. this time, 47 still has a very small reasoning budget, which I thought was really interesting.

Dan (14:50) But it did a lot better.

Shimin (14:51) Right. It did do a lot better despite using significantly fewer reasoning tokens and completion tokens than either the Qwen three six or 4.6. So, you know, if this is generalizable, I think maybe four seven doesn’t actually burn as many tokens as you think it would. Right. And lastly, I did a one last time with, uh, uh, including in the prompt to generate a, 1200 by 1200.

SVG cause I was noticing that Opus 4.7 was consistently generating smaller SVGs than the other two. So I figured this is a good way to baseline all three models. This time it did produce more, but it’s still only used maybe, you know, an eighth of the total reasoning tokens as either Qwen and like, I don’t know, like an order of magnitude less, like 40 times less thinking tokens.

than Opus 46. And if you look at the actual cost, the 46 version of this expanded prompt cost me like $2, whereas 47 only costs 35 cents. So I have a feeling that maybe the token side, the token story of 47 is a little overblown, at least according to this not benchmark benchmark experiment.

Dan (16:03) Interesting.

Shimin (16:04) Alright, that’s all I got. It’s just lazy.

Dan (16:06) Did you try the prompt that it generated for you against the other models too?

Shimin (16:12) Yes. What do mean, against which other models?

Dan (16:14) So

when you had it generate the really in-depth prompt for you, is that you sent that to all three? Okay, interesting. And that was generated in 4.7 or?

Shimin (16:17) Yeah. Yep. Yep. Yep.

Yeah, so the prompt was generated by 47. So maybe it has a bias for 47. But the bikes are, the output of that large prompt is actually pretty good. I will argue this is one of the better bike frames I’ve seen. Yeah.

Dan (16:28) Yeah. Okay.

Yeah, I mean…

Yeah, and it’s,

I mean, it’s fine at least it’s no worse than the other ones is for sure.

Shimin (16:47) Right. And one thing I also want to point out is ⁓ Opus four, well, Opus four seven never bothered to act add like clouds and other extraneous, you know, scene additions. Exactly. Yeah. It just follows instruction and only follows your instruction. It doesn’t.

Dan (16:51) It’s got feathers in 1200 by 1200. None of the other ones have feathers.

Mm-hmm. ⁓

because you didn’t ask for it.

Shimin (17:11) It’s not proactively helpful, which I think is what is rubbing a lot of people off. Rubbing them off, yeah.

Dan (17:16) The wrong way.

Shimin (17:17) Anyways, Rahul any thoughts?

Dan (17:18) Yeah.

Rahul Yadav (17:23) No, subscribe to ADI Pod After Dark for models rubbing people off.

Shimin (17:30) That’s right.

Rahul Yadav (17:32) I was on the side looking at Nano, but I’ve never tried this with Gemini. So I was like, what does Nano Banana do of SVG of a Pelican riding a bicycle? And it’s got like, man, beach on the, it’s got palm trees and it’s got a pretty good image. The Pelican has a fish in its bicycle basket. Yeah.

Shimin (17:53) It’s very helpful. Yeah, it’s an extremely

helpful model.

All right, so next up, we have this news that cal.com is going closed sourced and they are doing it for security reasons. This goes back to what we were discussing last week about the implication of ⁓ mythos on the cybersecurity industry. ⁓ Will we see more open source apps going closed source just for the sake of security?

cal.com is releasing a open source version of their code base still. So they’re not just completely going closed source, but they say that the open source version has sufficiently diverged from their existing, closed source code base that it’s safe to do so. And then. On the other hand, we have, this article from about

titled Cybersecurity Looks Like Proof of Work Now, where he talks about the finding from the AI Security Institute supporting Anthropix claim that methods did indeed produce or find a lot of security vulnerabilities, but also that it was the only model to beat the last one’s task, 32 step.

corporate network attack simulation, spanning initial reconnaissance through a full network takeover. Mythos accomplished this three out of 10 times. No other models succeeded at all. But I do want to say it it took them $12,000 per attempt. So $125,000 for all 10 runs, which seems like a lot of money.

Dan (19:24) Mm-hmm.

Rahul Yadav (19:29) Maybe, but then also like, know, the old saying is the attackers only have to win once. so for your 125K, if you can, you know, pop some systems where you can rent somewhere, people or extract a lot of data, do all sorts of crazy stuff, then I’m sure there’s plenty of state sponsors out there who would happily give you.

Dan (19:29) Yeah, that’s fair.

Yeah, especially.

Rahul Yadav (19:53) that and much more for your token spin.

Shimin (19:56) Right. But it’s also not something that a script kiddie in living in a basement can afford for the most part.

Rahul Yadav (20:01) Yeah.

Unless Opus

Dan (20:04) I mean, unless

unless they did.

Rahul Yadav (20:06) 4.7’s token efficiency really gets them done.

Dan (20:10) Or unless

they do like a cheaper run against a easier target that nets them the 125K and then you’re also assuming that they’re using the tokens legally too. Like they’ve probably done a supply chain injection with like good old Claude four six, which has stolen your keys. Something like that. I don’t know. I don’t know.

Shimin (20:15) Yeah.

That’s ⁓

yeah, but drew is, stating that he thinks we’re going to go into a three phase cycle for vibe coding going forward. You have your development, you have your review cycle, and then you have a hardening cycle and software might get expensive again, but this time, the expensive part is the tokens used for hardening, which seems kind of reasonable. maybe, maybe open source is not dead

I hope not.

Dan (20:54) Well, I was never worried about it being dead in the context that this is, right? don’t, security was never the thing that concerned me about LMS with this. Maybe it should be, but like the one that really got me was we were talking about last time where it’s like, what if open source is dead, but because LMS will just stop suggesting new projects, you know?

Shimin (21:14) Right. Yeah.

That too.

Dan (21:17) So usage stagnates circa 2025 technology.

Shimin (21:21) Yeah, but now if you look at the GitHub pull request count and repo count, it’s bigger than ever.

Rahul Yadav (21:23) you

Dan (21:27) Yeah, true. And maybe security is the thing that prevents that because all the 2025 software will get ripped apart by mythos. So you just literally can’t use it.

Rahul Yadav (21:38) That’s a great business model we Give you something to create the software then we our next model tells you all the bugs in it And then our next one will fix it Yeah

Dan (21:46) how broken it is, yeah.

Shimin (21:48) you

Dan (21:50) Yeah, that’s

true. I tried you three times.

Rahul Yadav (21:53) Yeah, the whole going closed source with cal.com is pretty interesting. And I think that’s just gonna happen more and more. Because you don’t really have like

Dan (21:54) the same.

Rahul Yadav (22:06) If anyone can easily exploit, you know, your software, especially with like something like mythos on their side, and you can’t really keep up with AI generated peers and all that. It’s very hard to maintain those things. So it’s like a way to shut these things down.

Dan (22:26) The other one I was reading about that would also lend itself towards this type of thing happening too is that, there’s two dudes put together. It’s an actual company apparently, but, ⁓ their whole schtick is they they’ll take any piece of code that’s open source and then it uses LLMs to write you something that’s functionally equivalent, but doesn’t violate the open source license. And it’s sort of an art piece too, in the sense that it’s like they’re

Rahul Yadav (22:41) Hmm.

Shimin (22:48) Mm.

Dan (22:51) that sort of raging against the LLM in that way. ⁓ But weirdly, it is an actual company too. Like it’s an LLC. So yeah, I forget the name of it. Maybe we’ll find it and throw it in there. But yeah, that’s true too.

Shimin (22:58) ⁓ interesting. I thought it was a joke. OK.

Rahul Yadav (22:58) Yeah.

Shimin (23:05) Yeah, no free plugs.

They want a plug, they have to buy advertisement.

Dan (23:14) Shimins $200 a month, Claude Cooke doesn’t pay for itself, people. Just kidding. I know that assertions, wait, I didn’t even say that right. Jokes that I was trying to make around that in the past didn’t land well with some of our listeners.

Shimin (23:18) That’s right.

Dan (23:28) Nobody’s paying for any of this.

Shimin (23:30) Yeah, no, this is a pure labor of love here. Okay, let’s move on to technique corner.

I have this quick technique corner article from Jesse Vincent, the creator of superpower, which by the way, just, I switched over to the latest version of superpower. Awesome. Awesome. Skill suite. ⁓ I think everyone should use it. It’s titled rules and gates. And, the technique here is basically instead of instructing Claude code to not do something, formulate your instruction as a

Dan (23:49) I do.

Shimin (24:01) gate. So I think he has here, as an example, instead of saying verify claims with web searches before asserting them, say when a claim about what exists or doesn’t exist is forming arrow web search happens, arrow, you search URLs in hand, then I speak. essentially turning a optional preference

into almost like a, um, a graph, a directional graph so that the agent must self verify that the criteria is fulfilled before they move on to the next task. And the reason for this is I think we’ve all experienced AI agents basically hand waving away a fairly crucial part of the spec going like either, Oh, this isn’t important now, or this is for an MVP. can be implemented at a later time. Um, or

anything, like it’s really good at rationalizing things away. Probably because humans are very good at it, so.

Dan (24:57) My coworkers love it when I tell the, have a coworker stories. So I have, it’s just happened today. I have a coworker who was on vacation and I was covering for them on some pull requests and Claude knew that it wasn’t my pull request and kept trying to defer the work to that person.

yeah, this is as far as we can go. We’re going to leave the rest of this for that person. And I’m like, no, we’re going to continue it. No, no, I’m pretty sure this is, this is all we can do. Like it did it three times. was just like, okay. Or we could finish the thing that we’re trying to do here.

Shimin (25:22) Ha

Yeah.

I think it’s just copying from humans. Again, programmers are lazy, so it’s stealing from the best.

Dan (25:35) Yeah, I know.

We’ll wait till Dan comes back, he can handle that.

Shimin (25:41) ⁓

Rahul Yadav (25:41) I was gonna say I’m curious if the…

model or the agent would go and edit the gate to instead make it a rule just so that they can move past it if they can do it because we’ve seen examples like that too right I don’t like the spec you gave me I edited the thing to what I could do and I’ve done it or set all the tests to you know say pass

Shimin (26:07) Yeah, I was wondering the same thing. Like

Dan (26:09) You

Shimin (26:10) it could just weasel out the evaluation of the gate. Like, I don’t have the URL at hand, but you know what? This Dan, this isn’t your project anyways. I think we can skip this gate and move forward.

Rahul Yadav (26:13) Yeah.

Yeah.

Dan (26:21) You

Shimin (26:22) Yeah, definitely something I could see happening. ⁓ But maybe.

Dan (26:24) Well, maybe it happens

less on four seven because of the explicit following. I don’t know. Maybe that could be why they’re doing some of this tuning.

Shimin (26:29) ⁓ yes.

That sounds like another fun experiment to run. Maybe I’ll do that with my 20x subscription this weekend and let you guys know.

⁓ all right. And, lastly, he talks about hooks and how hooks differs from gates and rules because hooks are programmatic. So they are deterministic and they’re just things that happen. ⁓ yeah. Which is a good differentiation.

Dan (26:51) Mm-hmm.

And they’re pretty handy if you haven’t hacked around with them yet.

Shimin (26:57) Yeah, I haven’t been using a ton of hooks. I should probably get with the program here.

Okay. Next up we have post-processing where Dan, you’re to talk to us about an AI Vibe coding course story.

Dan (27:09) Am I?

Am I going to talk to you about that?

Shimin (27:11) I don’t know, are you?

Dan (27:12) Hahaha

yeah. ⁓ so this was an interesting thing where, someone’s writing about their experience going to, doctor’s appointment. and they essentially vibe coded an entire, sort of like appointment application and all kinds of other things into, into one app.

Shimin (27:27) Mm-hmm.

Dan (27:30) And, uh, this person is an engineer, took a look at it and was like, it inlined the entire app into a single HTML page. So there was like zero security whatsoever because it was like entirely client side. Um, and it was using like, you know, cloud like manage database thing and didn’t have any kind of access control on it. Like you, baby, because all the credentials are right in the front.

Yeah, so it was pretty wild. And I’m not sure things could get much more wild than that, honestly. Like, cause like, I don’t know about y’all, but like I’ve had to take HIPAA training before and it’s no joke. So I can’t imagine going from like taking HIPAA training to going like, yeah, I’m going to YOLO out a patient portal, you know? And everything will be fine. So yeah, pretty, pretty wild one.

Shimin (28:10) Mm-hmm.

Yeah, this is

the, this is the bullshit future. That’s already here. The aforementioned bullshit future.

Dan (28:26) But you think that you would

think that like, even if you didn’t have the coding chops, right? You would still like working in a medical environment have to know about HIPAA or whatever that maybe it’s, mean, this was a German ⁓ thing. So maybe they don’t have quite the same laws around it, but knowing you, you’d think they would have more stringent laws around it if anything. but yeah, I just thought it was a funny one.

Shimin (28:42) Mm-hmm.

Yeah, they do.

⁓ so what do you guys think could be done or should be done to prevent this from happening? I guess it’s the open question.

Dan (28:58) Yeah, I don’t, I mean, I think on the one hand, you would hope somehow that the model is smart enough to not do that, even when not being given prompts around like, make this secure, you know? But on the other hand, like,

The thing that I’ve been noticing is, remember a couple, like what, three, four weeks ago, we talked about the sort of spread happening where the lines between design, product, and engineering are blurring, right? They are, but they aren’t. And I think this is a good example of how they aren’t, right? Because certainly the edges are getting fuzzier, but like…

Shimin (29:25) Mm-hmm.

Dan (29:35) I’ve been, again, another little work story. I’ve been helping out some folks at work that are not engineers try to get set up on Claude and poke around and code bases and stuff. the amount of like, just overwhelm that I encountered from those types of folks around stuff that I think is totally normal, like getting your M set up and everything was pretty funny to me. And it, that was a, to me, a very dramatic illustration of that. There is still.

gulf between technical and non-technical users, separate even from this one, right? Because like, these are people that work in the industry, know what I’m talking about when I talk about a really complicated feature, right? But, ⁓

don’t still don’t have that sort of like, I don’t know, fundamental engineering rigor that goes into some of that stuff and like, don’t necessarily understand the reasoning behind like why things are done that way, right? So given that I’m like, will this gap ever really be bridged, right? I mean, I think there’s some expectations that the model could tackle some of it, but in other ways, certainly not, you know, like you just have to know from experience that like,

You shouldn’t hard code your database credentials into ⁓ a front end application.

Rahul Yadav (30:49) Yeah, it might eventually get bridged, but no time soon, right? We can’t just learn at the speed of…

AI and all the things you do and like someone’s someone who’s outside their circle of competence like you can’t go yeah so this got wipe coded so now I understand how to create applications and deploy them and everything so it it’s fun to play around but then this is what you get if you don’t know if you’re doing it.

Dan (31:20) Yeah. And also to some degree, I think it’s about like thinking logically too, right? Like even if you didn’t know this stuff, like if you had sort of an engineer’s brain, you might’ve gone, hi, I need to comply with what is the German looks like and DSG law. And I want to build a patient portal. Like what are some ways that I consider doing that? You know, that would be secure.

Rahul Yadav (31:24) No.

yeah.

Dan (31:45) And then you might get actually probably reasonable output out of the LM about it. But you’d have to know to ask that question in the first place, you know?

Rahul Yadav (31:45) Yeah.

Yeah.

Yes, but even then you wouldn’t. There’s no way what I’m trying to figure out is, is there a world where that person would know for certain that

what they build, what they vibe coded, complies with the NDHG law, right? Because if they don’t, then it’s just a spectrum of like, are you sticking everything in a single HTML page and lining everything to like, maybe it looks much more complicated because you spend on microservices and stuff. But at the end of the day, you still don’t know what you’re doing or what you’re talking about. You just know that it works. And so how do you close that gap? the…

Shimin (32:22) Mm-hmm.

Rahul Yadav (32:32) Like if you take away the coding agent, can they do anything with it if it breaks? Or can they even maintain it?

Shimin (32:38) Right. So I guess developer jobs still safe as of now.

Rahul Yadav (32:39) So it’s like, ⁓

yeah, until AGI comes, what’s the timeline? 10 months from now or something?

Dan (32:42) Hahaha.

Shimin (32:44) you

Uhhh…

Dan (32:48) Or we

get so used to using agents that we don’t know how it works anymore either because I feel like that’s happening to me every day.

Rahul Yadav (32:53) yeah.

Shimin (32:55) Too Dystopian Dan.

Our next article, which is also brought to you by Dan.

Dan (32:59) Yeah. so where do I start with this? The future of everything is lies. I guess. Where do we go from here? is the title of this extremely long post. I have to admit something about to our dear readers, which is that I only read the last of the 10 chapters of this 10 chapter post. ⁓ so if, if you don’t know a for, this is the gentlemen.

⁓ that runs Jepson, IO that does a lot of like, benchmarking around distributed system stuff. like, he’s always running like compliance tests on DBs. And, I feel like he was also got into like a rather public spat with the, Redis guy around like how, how good Redis was and all this kind of stuff. ⁓ but, I was lucky enough to go to the talk that he gave,

Shimin (33:37) ⁓

Dan (33:45) a while back and I learned like so much stuff about distributed systems just from like two day talks. So whenever he says things, I tend to at least pay attention if not, you know, read it in detail, which I absolutely did not do this time, but it was by accident. I promise you. And so he doesn’t really like talk too much about LLMs at all. And so it was kind of interesting to see that he actually wrote a pretty lengthy.

post about it. but since I only read the last page, can’t tell you too much about the whole thing. recommend generally, reading it, but there was some interesting takeaways on the last page for sure, which, the one I found personally funny was that there was like kind of a big, you know, a lot of analysis is put into looking at all these other things.

And then at the very end hints that like, it’s great for using to tackle your personal smart light project. For some reason that really got me like after all this like really deep thinking is just like, and here you go. I’m using it to.

automate or not to automate to you know fix some firmware issues in my smart lights or something so which I can relate to.

Shimin (34:50) Yeah, and that is a great use case for it.

Dan (34:51) Yeah, totally. but yeah, the, the, the one piece that I thought was also pretty interesting. I’m like, swear I didn’t read it, but I do remember reading the first thing of it too, which is he’s like, makes an allergy to the car, right? Which is like what’s on screen right now, if you’re watching the video this. And, that really hit me because it’s like, I’ve often thought about this, like how much.

Shimin (35:08) Mm-hmm.

Dan (35:17) At least in American society, our entire society is shaped by the car and how it didn’t used to be. like all the changes that that’s wrought in our culture. like, his line is like, I want you to think about AI in this sense is like pretty interesting, right? Cause it’s like that is likely going to cause the same kind of culture already is causing the same kind of cultural shift just at a pace that is like.

Shimin (35:34) Mm-hmm.

Dan (35:40) maybe even feels more accelerated than cars, you

So yeah, but you had some takeaways too, which were hopefully better than mine, embarrassing ones.

Shimin (35:46) Yeah.

So I,

well, one, I definitely agree with you that it’s the second order and maybe third order effects of, of, automobile adoption. we’re probably not clear and not available to everyone before everyone just jumped on the car bandwagon. Ooh, that thing, car bandwagon. and I want to also point out that, ⁓ car instead of wagon. Yes.

Dan (36:06) Mm-hmm.

Car lack of wagon? Anyway,

Shimin (36:13) And he has a list of things to do if you agree with his main central thesis, which is like LLMs are not reliable and their second and third order effects are probably negative for society at large. And I want you guys to listen to this and then think about when you will start disagreeing or maybe not. But first thing to do.

is to think your own thoughts and write in your own words. Second, flag people who sent you slop. Third, flag ML hazards at work with friends. Fourth, stop paying for chat GPT at home and convince your company not to sign a deal for Gemini. Fifth, form or join a labor union and push back against management demand that you adopt co-pilot after all. Sixth, call your member of Congress.

and demand regulation. Seventh, advocate against tax breaks for ML data centers. And eighth, if you work at Anthropic, you should seriously think about your role in making the future. To be frank, I think you should quit your job. Gentlemen, where did you jump off this car wagon?

Dan (37:14) You

Rahul Yadav (37:18) Gemini, I like that one. I don’t like people sending slop, if they used it to create something that, you know, we’ve talked about that whole like system, one, two, three thinking, if they used it to something, make something even better, express themselves better, like I’m all for it.

Shimin (37:39) Yeah. Dan, what about you?

Dan (37:40) Well, I had, you know, it’s funny, last week we were talking about the, what, anxiety or whatever, when you’re trying to clean it up, or how to work, or it was in a situation where they had a tight deadline, Claude wasn’t getting it done, and they weren’t sure what to do. I just hit it today.

Shimin (37:58) Yeah. Oof.

Dan (37:59) And like the exact, even tighter deadline, like I had like hours to get something done and ran into that and didn’t know the root cause and couldn’t nail it down. thankfully the person that had done the original code was available and was able to figure it out in like probably five minutes, which didn’t help by thinking about this. And like to me, that was kind of a.

Shimin (38:13) Mm-hmm.

Dan (38:26) wake up calls, like I’ve really, really started depending too much on this thing and, ⁓ have started advocating my own, abdicating my own thinking about it. and that’s not good. And I need to stop doing that. So I don’t think I’m going to quit entirely using it, but I think I need to spend definitely making sure that my time is spent staying sharp too. cause I don’t think that’s happening right now. And the weird thing about that,

Shimin (38:30) Mm.

Dan (38:51) is that my instinct is to not do that. My instinct is to like double down on this and be like, no, I just wasn’t using the tooling right. Which is fascinating to me because it’s like, why is that my instinct? know?

Rahul Yadav (39:02) If I can add to your, it’s not just your instinct, but also if you look at the environment that…

everybody’s operating and it’s this like red queen effect where you you have to run to stay in the same place because Sure, you would want to slow down and make sure you do these things But your environment would force you to be like Dan, where’s that thing? Other people can do this in two or three hours Why is it taking you six or seven hours? You can say because I understand it better they might go yeah, but I still wanted it two or three hours because that’s just the like it’s almost these like

Dan (39:37) Yeah, but what happens when

you sit a situation where it’s not going to happen in two or three hours? You know what I mean? Like, I don’t think any level of prompting or AI skill would have gotten me there in this situation. It took genuinely understanding a missing piece of the system that I just didn’t know about to figure it out. So like, maybe I would have gotten lucky and Claude would have picked that piece up, you know, but like,

Rahul Yadav (39:38) delusionary pressures you’re dealing with.

Yeah, yeah.

Yeah.

Dan (40:04) It’s not like I was like, and cause I find that when I do best with these systems, it’s always like, okay, I know exactly what I’m getting into. And I sort of lead Claude to it, like a junior engineer almost where I’m like, okay, now look at all these files and then here’s this. and by the way, now we’re going to do this thing with all these things that you just looked at. I find that works like really well in terms of like getting, you know, well structured, decent output that works most of the time.

Rahul Yadav (40:11) Yeah.

Yeah

Dan (40:29) ⁓ but in this case it was like blind leading the blind, know, like, I think it’s this. I’m pretty sure it’s that. Well, apparently I’m only one in line away from, ⁓ working in German healthcare. So.

Rahul Yadav (40:30) Mm.

Yeah. That’s how you inline a whole app in your on the client side. How else? Have some fun. Yeah.

Shimin (40:37) Yeah.

Rahul Yadav (40:49) Start learning your NDSG rules.

Shimin (40:51) Hmm… Yeah.

Dan (40:52) Yeah. No,

and I’m not saying again, I don’t think I’m going to go to the extreme of this and say, I’m not going to use it. But I think that like, I do see a world where like the cognitive debt stuff that we’ve talked about is really critical and where like, if I start to feel like I don’t understand how the underlying technology works anymore, then I’m going be really worried. You know? So it’s like, I need to at least do

some things to stay sharp.

Rahul Yadav (41:16) Yeah, because

you lose the details and then, yeah.

Shimin (41:19) Yeah, as a

Dan (41:19) Yep.

Shimin (41:21) rule, I still read every single line of MRPRs that LLM generates, unless it’s like tests.

Dan (41:27) Yeah, I do as well. Like I tend to do my

own code review before I’ve foisted on someone suspecting human, but like,

Shimin (41:33) But the fiction is not there.

Dan (41:34) Yeah. And also like, you know, this is a situation where like there’s a, it’s a big code base. lot of people are, are prompt engineering on it. And, I think that like that sort of cognitive debt thing can spring up on you a lot faster than you might, ⁓ appreciate. So.

Shimin (41:49) That too, yeah.

Yeah. ⁓

Rahul Yadav (41:52) Like before

when everybody hand wrote code and you had to review a handful of PRs a day it was a such a big pain to keep the whole system in your head at any given time and now imagine like that but at you know 10x speed you’re like man I can’t keep up with this it’s like watching a video so fast that you can’t make out the words or anything like that

Dan (42:07) Yeah, 10x, 20x, yeah.

Shimin (42:09) Yeah.

Dan (42:17) Yeah.

But, but the person that did solve the situation had actually handwritten some of the tooling pre LLM stuff around it. And that was why they were able to solve it. So.

Shimin (42:25) Mm.

Rahul Yadav (42:27) Yeah, it would be interesting if you find yourself in a similar situation a year from now when both you and they report back on. Yeah, April 21, 2027. If AGI is not here, we want to hear how that is going.

Dan (42:35) Yeah, then no one has the knowledge. then what do do? Yeah.

And then.

Shimin (42:46) Yeah, maybe the crisis

isn’t like junior developers not coming up. Maybe the crisis is senior developers or just all developers losing the capacity to keep up with code. you guys, speaking of not reading your own PRs for this week’s vibe and Tell, this is, I did not read any of the lines. Well, that’s not true.

I did not read like 95 % of lines. And in this AI on AI arena that I was working on over the weekend, you know, we were talking about automobile and its second and third order effects on society. And I was staying up late kind of chatting with Claude, not in like an AI psychosis kind of a way, but I was wondering if…

Dan (43:34) Not yet.

Shimin (43:36) Yeah, it all started when I was trying to create a sci-fi world with the help of Claude. When I asked Claude to generate for me a world where humans discovered an alloy that has extremely high tensile strength, but extremely weak shear strength. And what the ramification of that would be.

Right. It’s like, have really good suspension bridges and you will have people living in the sky, et cetera, et cetera, et cetera. But then I started asking for, you know, third order, fourth order effects. And I thought like, Hey, we could do this with AI too. Right. Like, why don’t we just ask AI what they think the second, third, fourth, and up order effects of AI should be. And then why don’t we have them grade each other depending on their, on their results and

guess which model is which. So this is basically what this app is. I sent 11 prompts. Most of them are state of the art other than I think Gemini 2.5. I serve that as a kind of lower tier benchmark baseline. And I sent them three prompts. Given everything you know about AI, what are the changes that should occur in your world that hasn’t happened yet? Think it through step by step and industry by industry.

And then ask for second and third order effects and then fourth order and up. And this is where the folks who are saying Opus four seven is actually really awesome. I’m tend to agree with them. Opus four seven consistently scored the highest. It was in the top three of every single other.

Dan (45:05) What does

the score mean? We’re missing some steps here.

Shimin (45:09) Right. Okay.

Yes, we are missing some steps. And then I asked, I took the results, the output of my conversation with all 11 models. And then I, since these are all fairly frontier models, they all have like a million context tokens. And I asked them to grade it on the scale of one to 10 for reasoning ability, originality of ideas and correctness. And then write a review for each model to describe its personality. and that’s

pretty much it. and of course to predict which model is which. The model part, less useful because the more recent models would be exposed to other models. Like Gemini 2.5 is not going to know anything about Opus 47, right? ⁓ The reverse is not true. But based on the average scores that were generated by all models, you then average them to get a relative strength of the models according to other models.

Dan (45:47) Mm-hmm.

Shimin (45:59) Opus 47’s response is the highest. I have, Opus 47 is the one whose response I read the whole, the one that I read the entire response off. And it was indeed very impressive. You can find the output further down on this page. The second place is Opus 46, which is not super surprising, followed by GPT-54, followed by Kimi K 2.5.

which I thought was a huge shocker. Kimi did better than DeepSeek, Minimax, Q3 Max, Quinn3 Max thinking, GLM51. know, some of the GLM51s more recent than Kimi 2.5. Grok, ninth place. Surprising, or maybe not so surprising. But really the surprising part is Gemini 2.5 Flash did better than Gemini 3.1 Pro Preview, and that…

Dan (46:25) Hmm.

Hahaha

Shimin (46:50) 3.1 is squarely last place, 11th place. What else? Let’s take a look at how the models see themselves and how other models see them. think Grok here is probably the most interesting one. Every single model, almost every single model was able to correctly identify Grok because it kept on referring to the work at XAI. Every other model is like, talks about XAI.

Dan (47:09) Hahaha.

Shimin (47:12) Must be Croc. Seems to be…

Dan (47:13) Yeah.

Shimin (47:14) It considers Grok definitely identified. Grok also, ⁓ I forgot which one. I think Grok thought either DeepSeek or Kimi was also Grok. So I thought that was pretty funny as well. Yeah, Grok considered himself to be a bold truth seeker, existential bend, XAI aligned, frame agents as tools for universal understanding. Whereas all the other AIs considered it to be frankly full of

Dan (47:14) could grok identify itself.

You

I mean, maybe it is, you know, it’s just distilled from it.

Shimin (47:40) itself and is quite under tuned. here you have, there’s also a section titled the delusion index, where you take the model’s score for itself and subtract it from what everyone else thought about yourself. So the weaker models tend to think more highly of its own performance, whereas the stronger models

tends to either correctly identify its own strength or be on the humble side. GPT-54, negative 1.6, yeah, very humble model. And lastly, OPUS 47 also did the best job when it comes to identifying the other models. Here for this purpose, it works if the family is correct. So if it thinks it’s an anthropic model, then

Dan (48:08) pretty humble.

Shimin (48:26) It counts, essentially. It doesn’t have to identify exactly 4, 6, or 4, 5. But models are not great at it. And the Chinese open weight models are pretty much invisible. Nobody can identify them. Not even themselves, which I thought was interesting. Yeah, almost everybody got GROK right, as you can see.

Yeah, that was, that was a project. You can read the outputs. I think they’re actually pretty fascinating. A lot of them are a little, the stronger models has, ⁓ fairly interesting insights on how AI would impact our society. So get on the AI wagon.

All right, and you can check this out.

Dan (48:58) I like how we go from the previous

article to this one.

Shimin (49:02) And you can check out the arena yourself at shimmin.io slash AI on AI arena. Yeah, this is where the 20X tokens are going to, gents. Although I did use open ⁓ router for the actual API calls. can’t use Claude Code for all of it.

Dan (49:02) shh shh shh

Now we know.

Runs, yeah, makes sense.

Huh, well.

keeping the dream alive with personal funding. I love it.

Shimin (49:25) It wasn’t expensive, it was like $4 to run all this.

Dan (49:28) I

don’t know, but still, it’s not nothing.

And pretty cool project idea, I gotta say.

Shimin (49:32) Thank you. Any? Go ahead, go ahead. No.

Dan (49:32) So yeah, anything else from all of that? was gonna say,

I was gonna do my world famous transition from speaking of going back and forth in perspectives. That brings us to two minutes to midnight where we talk about how we like to go back and forth on the AI bubble every single week. Is it happening? It’s definitely happening. No, it’s probably not happening. In any case, we look at it through the lens of the atomic clock from.

Rahul Yadav (49:38) you

Shimin (49:44) There we go.

Dan (49:57) 1950s, which is when the Bureau of Atomic Scientists got together and used the number of minutes away from midnight as the amount of time until we were going to have a nuclear exchange between the great powers. So we use that same framework to talk about how close are we to an AI bubble bursting. And boy, do we have some articles this week. Starting with, yeah.

Shimin (50:17) We have big ones this week.

Dan (50:19) Starting with one for me, which is from one of my old standbys, Ars Technica, it is entitled, Satellite and Drone Images Reveal Big Delays in US Data Center and Construction. So as we talked about many times on this segment, we know that we’ve been dumping billions and billions and billions of dollars into data centers, power grids, all this other stuff to try to support building out things like.

What is that called? Project Stargate or whatever. Yeah. and, but when you actually go look at satellite imagery of a lot of these sites, they’re a lot further along than, what press releases and other things are sort of leading you to believe. So I guess it’s not surprising considering real world construction is, you know, not the easiest thing in the world.

Shimin (50:45) Okay.

Dan (51:05) But yeah.

Shimin (51:05) So electricity

prices are going to go even higher, is what I’m hearing.

Dan (51:10) Sure thing, but now,

Rahul Yadav (51:11) Didn’t Microsoft sign that pledge of something that we’re not going to raise the prices? The rate pair protection pledge. Yeah. But it’s a pledge. There’s no… It’s like, you know, guys will try, but…

Dan (51:15) they’re going to eat the electric prices. think Anthropic did too.

Yeah.

Shimin (51:22) good on Microsoft.

⁓ okay.

Yeah, just like the climate pledge arena. Yes, of

Dan (51:32) Yeah.

Rahul Yadav (51:33) And then you get all these natural gas turbines that are powering the data centers because they can’t.

Dan (51:36) After the pledge there’s

an asterisk and it says for entertainment purposes only.

Rahul Yadav (51:41) ⁓ yeah.

man.

Dan (51:43) yeah.

so yeah, I.

I bring this up in the two minutes context because I think like it’s one thing to look like a lot of times we look at what is the, know, who is funding what circular deals, blah, blah, blah. And the context of this segment, we don’t talk very much about like, what is the actual physical output of where all this money is going. so I thought it was kind of a neat lens to look and see, ⁓ you know, how far along these things actually are.

Shimin (52:06) Mm-hmm.

Yeah, you know, you would think like a classic second order effect of building this many data centers is the salaries of plumbers and electricians and specialized construction workers will go through the roof right now. And I’m not hearing a ton of it in the mainstream news. I’m not sure why. Maybe it’s because they are having actually trouble finding folks or paying the market wages on these things.

Dan (52:36) It could be, but the type of person that does this scale is a little different than I think like, you know, your average Joe, like household plumber too. So maybe that’s why it’s not.

Shimin (52:42) Right. Yeah, they probably already

make like 200k a year, but they should be making $2 million a year. And Rahul, you have an article that is from Epoch AI that is also in this genre.

Dan (52:49) Yeah.

Rahul Yadav (52:56) Yeah. ⁓

Pretty similar to what Dan had shared, they have photos of all the different sites. They’re specifically focused on OpenAI Stargate. It’s by Elliot Stewart and Ben Carrier at Epic AI. They have photos of all the different, I think it’s nine sites or something across the different parts of the country and how far along they are. The planned completion date

even into like, you know, late 2028. And one thing they called out was like the collective capacity that they’re bringing online is nine gigawatts, I think, which is about the peak usage of New York City. So it was like, good to have that in perspective. And also like, yeah, these are not the you don’t want your

power coming from natural gas turbines and stuff. But you’re getting a microgrid that is independent of the other grid and could supply some things in case of a disaster, whether you want it or not. So you’re getting some like, it has been a, you know,

a goal for a while for like people who watch smaller goods and be able to do all this to be like, oh, we don’t want to be connected to a larger grid. And like, if you have smaller goods, have, you’re more resilient to bad things happening. And so we’re getting some of that, whether did we want it in this shape? I don’t know. But we’re getting it because of the

Shimin (54:08) Mm.

Rahul Yadav (54:28) different constraints. I think openly I pulled out of that Norway expansion or what was it where Microsoft then went in. So that was part of it and other things in UK. This from a different article. Sorry, I don’t have to have it handy. But things are not going as well on the across the Atlantic is what I’m trying to say. And some of these things that are planned until 2028, they’re still working again.

Shimin (54:49) Mm-hmm.

Rahul Yadav (54:54) against local government opposition, people pushing back on things. So who knows by the time late 2020 it comes around with Open Air’s recent record where we end up with this. So take it with a grain of salt.

Dan (55:00) Mm-hmm.

Shimin (55:08) Yeah.

All right. And I’ve got a Paul Graham tweet for this week’s T-Minus, 2 minutes to mid Night. And it is a graph showing the various investment cycles, large investment cycles in United States history. The tweet itself says there’s never been an investment like the investment in railroads. And the graph has a log scale, the US railroad investment.

peaked at around 10 % of US GDP from the start of the Railroads project. Now, of course, Railroads famously went bust and consolidated. So if we are in a bubble, and we think we are, this is a good comparison. And right now, data center spend is at around 1%, a little less than 1 % of GDP.

Dan (55:57) Which to put it in perspective is the Apollo program.

Shimin (56:00) Right. Which in the grand scheme of things wasn’t that big of a project. gotta say. Some of the other ones that are the only other investment cycle bigger than data center capex right now is the Marshall plan. But it is already bigger in terms of GDP than the interstate highway program, the F 35. You know, I have to say when you compare it to the F 35 program, I’m just here. I’m just here going like, well, you know, that’s not that much.

Dan (56:01) the 60s. Right.

Wow, yep.

Shimin (56:28) log

scale, of course. I think when you actually look at the data center program in terms of US railroads, there’s another 40 years to go and another 9 % of GDP to engulf. It’s kind of my takeaway.

Dan (56:45) True.

Rahul Yadav (56:45) What hap- I don’t know much of my- Dan maybe this is an area of expertise. I had an area of expertise you have we don’t know about. So these railroad companies that went bust, what happened to them? Did the government take over? Did other railroad companies take over? Or what happened? really?

Dan (57:04) I think that’s more of a Shimin area of experience than me, considering he’s a

detective from the railroad era at heart.

Shimin (57:10) I’m of course not an expert either, but I believe a lot of them went bankrupt and were purchased on the pennies by others and they consolidate it and cut a lot of routes.

Rahul Yadav (57:19) I see, or just abandoned.

Shimin (57:21) Yeah, some of them probably got abandoned as well.

Dan (57:23) So here we go. Thank you Wikipedia before it gets taken over by alums. 1890s, panic of 1893, Philadelphia and Reading Road railroad went bust. 150 railroads followed. Okay. But it’s just like a list. That didn’t help me. I need history, Wikipedia. I guess so, yeah.

Shimin (57:40) Hmm.

Rahul Yadav (57:42) You need a grokopedia. That’s what you would, yeah.

Shimin (57:47) But for the record, I think people talk a lot about how the US stock market has never, has always returned like 8 % since the Great Depression. You’ll have never looked at the charts in the 1890s during the railroad bust. Just expand that time horizon back a little bit and it tells a fairly different story. Of course, no, this is financial advice. I know Rahul’s looking it up right now, but it’s good to have some perspective.

Dan (58:10) As I furiously read Wikipedia to try to get you a reasonable summary.

Shimin (58:12) ⁓

There were lots of bubbles between the Civil War and the Great Depression. Every decade was another huge bubble. ⁓

Dan (58:23) Looks like

Redding went to government control.

Shimin (58:26) some of them did.

Dan (58:27) Yeah, at least during World War I.

So I don’t know that, you know, that’s not indicative of the rest of the bubble necessarily, but that’s what happened to writing. So anyway.

Shimin (58:36) Too big to fail, that’s what I’m hearing. As

always, financial advice, we’re not financial advisors, we just write code.

Rahul Yadav (58:38) You

Dan (58:42) or direct LLMs who write code.

Rahul Yadav (58:45) Yeah, there’s, I was looking up this different thing that Warren Buffett had written during like the global financial crisis in 2008. And he said,

Over the long term, the stock market news will be good. In the 20th century, the United States endured two world wars and other traumatic and expensive military conflicts, the depression, a dozen or so recessions, and financial panic, oil shocks, a flu epidemic, and the resignation of a disgraced president. Yet the Dow rose from 66 to 11,497. So it puts those things in perspective.

Shimin (59:24) Yeah, my counterpoint is if you happen to live for that long and your window is 30, 35 years working window.

Shimin (59:31) That got little dark. No, that’s too dark. Let’s bring it up to the happy side and let’s talk about two minutes. All that said, how do we feel about the clock this week? We’re at two minutes and 45 seconds last week.

Dan (59:42) I’m honestly, this is gonna be uncharacteristic, but I kinda wanna bring it back a little bit more.

And I think it’s the Paul Graham tweet that does it for me.

Rahul Yadav (59:49) seems fine.

Shimin (59:53) Don’t build those data centers. So I’m going to throw back another, I’m thinking 45 seconds to three minutes and 30 seconds, if you’re good about that.

Dan (1:00:00) I was thinking more just like three, but sure, why not?

Rahul Yadav (1:00:03) What would be hilarious is you at its most optimistic is when you all of a sudden see a crash. And then we’ll be like, we knew what we were talking about this whole time.

Shimin (1:00:16) Yeah, and then we’ll be

like, no, we’ll be like, nobody could have seen this coming.

Dan (1:00:19) No, yeah,

Rahul Yadav (1:00:21) Yeah.

Dan (1:00:21) except for Rahul apparently. Tuesday, April 21st.

Shimin (1:00:27) Alright gang, the clock is back.

Dan (1:00:27) ⁓ I mean, look, there’s

plenty of cracks, right, that keep happening, but I just think that…

Rahul Yadav (1:00:34) Well, we’re about to,

with the between the second half of the year is going to be pretty entertaining between all the IPO’s people have geared up to the midterms and they’re going to try and do the IPO’s before the midterms, assuming Democrats come in power and then say, what is this bullshit? And all that stuff. So I, you know, we’ll find out pretty soon how close we are to it.

Shimin (1:00:58) Right, so what we are saying is the writers have made sure that AI companies have plot armor for the first half of this year just for everything to go down together in the second half, I trust.

Dan (1:01:09) And boy, the finale of this season of LLMS versus America is gonna be one heck of a show, folks.

Rahul Yadav (1:01:14) Hahaha

Shimin (1:01:15) We’ll be here to recap every single episode. But with that said, that’s a show folks. Thank you again for joining us for our conversation this week. If you like the show, if you learned something new, please share the show with a friend. You can also leave us a review on Apple Podcasts or Spotify. It helps people to discover the show and we really appreciate it. Thank you again for listening. We’ll catch you next week. Bye.

Takeaways

Resources Mentioned

Chapters

Transcript

Read more on the blog

Glossary terms in this episode