Episode 23 · May 1, 2026

Why Models Over-Edit Your Code, Meta Keystroke Surveillance, Interviewing Engineers in the AI Age

GPT-5.5, DeepSeek V4, Meta Model Capability Initiative, employee keystroke tracking, mouse movement surveillance, coding model over-editing, Levenshtein distance, Opus 4.6, GPT-5.4, task is not the job, Luis Garicano, Silicon Continent, residual rights, bundling, friction industry, Stanford 2026 AI Index, HAI, AI talent migration, Andy Masley, Nathan Lubchenco, The Future Was Yesterday, interviewing software engineers, take-home tests, personal harness, code review bottleneck, cathedral builder, bazaar shopkeeper, Anthropic Claude Code Pro plan, Sebastian Mallaby, Infinity Machine, Demis Hassabis, AlphaFold, Princeton, Oppenheimer, two minutes to midnight, open-weight cybersecurity, too big to fail

Listen on

Apple Podcasts Spotify Overcast Pocket Casts Amazon Music

Is GPT-5.5 finally a 4.7-tier model? Did DeepSeek V4 just close the gap with Anthropic? And what does it mean that a senior ML engineer says he can’t out-code Claude anymore? Shimin, Dan, and Rahul are joined by special guest Nathan Lubchenco — ML engineer and Substack author of The Future Was Yesterday — to cover OpenAI’s GPT-5.5, DeepSeek V4 (1.6T base / 49B active params with 1M context), Meta’s new Model Capability Initiative tracking US employee keystrokes and mouse movements, a Levenshtein-distance study on coding-model over-editing, the 2026 Stanford AI Index report, and a deep-dive on how to hire software engineers when the agents are already better at coding than the candidates.

Takeaways

Models are now consistently better at coding than even senior ML engineers, by their own admission. Late-2026 may be when they cross the median software engineer.
Coding-model over-editing is measurable (Levenshtein distance on boolean-flip tasks) and instruction-followable — explicit “minimum-edit” prompts close most of the gap.
The US is unusually a slow adopter of a major technological wave. Workplace AI usage is highest in emerging economies, not the developed world.
“The task is not the job” — humans remain indispensable on the bundling dimensions: catching what customers don’t say, and avoiding interactions that end up on social media.
Software engineering interviews should include the candidate’s personal harness, with company-provided API keys for equity. LeetCode optimizes for the wrong signal in 2026.
DeepSeek V4 closing the gap with Mythos in 3–6 months is what makes the bubble too geopolitically important to fail.

Resources Mentioned

Chapters

(00:00) - Cold Open & Welcome
(01:31) - News Threadmill: GPT-5.5, DeepSeek V4, Meta Watches Every Keystroke
(12:28) - Post-Processing: Coding Models Are Doing Too Much
(18:59) - Post-Processing: The Task Is Not the Job (Luis Garicano)
(32:20) - Post-Processing: The 2026 Stanford AI Index Report
(38:11) - Deep Dive: Interviewing Engineers in the AI Age (with Nathan Lubchenco)
(45:05) - Deep Dive: Reforming Software Hiring — Take-Homes, Personal Harness, Equity
(50:15) - Deep Dive: When Models Cross the Median Engineer (Late-2026 Prediction)
(59:29) - Deep Dive: Why Code Review Is the Current Bottleneck
(1:00:21) - Deep Dive: Should PRs Show the Prompt History?
(1:02:27) - Dan’s Rant: Anthropic Tested Removing Claude Code from the Pro Plan
(1:05:44) - Rahul’s Rampage: The Infinity Machine — Demis Hassabis & Corporate Gravity
(1:14:32) - Two Minutes to Midnight: Bubble Clock Moves Back to 4:00
(1:26:30) - Outro

Transcript

Show full transcript

Shimin (00:00) and welcome back to Artificial Developer Intelligence, a weekly conversation podcast about AI and software development. We go through hundreds of links each week so you don’t have to and bring you the news and articles most worthy of discussion. My name is Shimin Zhang and today with me are my co-hosts. Dan, the way people actually use a Claude subscription has changed fundamentally, Lasky and Rahul.

Federally, there’s no limit on worker surveillance, Yadav. And we have with us this week a special guest, I think from sunny Fort Collins, Colorado here, Nathan, the AI philosopher king of Substack, Lubchenco. Hi, Nathan. Welcome ⁓ to ADI pod. Nathan has a Substack titled.

The future was yesterday where he writes about AI and software engineering. Welcome to the show.

Nathan Lubchenco (00:54) Thanks, excited to be here.

Shimin (00:55) All right, so on this week’s show, as always, we’re going to start with the news thread mill where we have a couple of model news as well as some interesting corporate news from Meta.

Dan (01:04) Yep. And then we’re going to follow up quickly with post-processing where we have a couple articles this time. one is on our coding models, or coding models are doing too much, definitive statement. The task is not the job. Sounds like my rant about Jira tickets. We’ll save that for another time. And then the 2026 AI index reports.

Shimin (01:24) All right. That will be followed by our deep dive segment where we’re going to interview Nathan about his article, interviewing software engineers in the age of AI, but also how he got into the machine learning AI biz.

Dan (01:39) And then next up, I am going to be ranting about something, maybe. Depends on if Shimin takes away my free Claude Code credits or not. We’ll find out.

Shimin (01:48) Followed by Rahul’s rampage. wait this week. He is not rampaging. He’s been reading weird. Let’s find out what Rahul’s been reading

Dan (01:56) Hmm. I didn’t know you could read, so that’s an interesting discovery.

Rahul Yadav (01:59) It’s all thanks to AI.

Dan (02:01) And finally, we’re going to hop through two minutes to midnight where we chat about whether or not the AI bubble is bursting or what stage of the bursting we’re in, where we’ve got a handful of articles and we’ll cover those when we get there. So thanks for listening. Let’s dive right in.

Shimin (02:15) Alright. So first up, we’ve got a couple of model news this week.

Dan (02:20) Yeah, so the, I guess the big, biggie is a GPT 5.5. I don’t know if anyone else has managed to use it yet, but to me, this kind of feels like a point release. We got to catch up with the anthropic. So I don’t know. I don’t, I admittedly don’t spend as much time using OpenAI models as I probably should, but it’s.

cool to see that they’re at least still keeping up with Anthropic in terms of their point releases. And that’s really all I’ve got about 5.5 unless anyone has a ton of detail.

Nathan Lubchenco (02:47) I was sufficiently disappointed

with Opus 4.7 that it’s gonna make me install OpenCode and try it, so that’s something I guess.

Dan (02:52) Try it.

For personal stuff or work? Yeah.

Nathan Lubchenco (02:57) At work.

Yeah.

Shimin (02:58) ⁓ Yeah, Nathan, you

Dan (03:00) I-

Shimin (03:01) write about using Codex a bit on your sub stack.

Nathan Lubchenco (03:05) Yeah, that was before I switched over to Claude Code. There was a period of time when we didn’t really have access easily to anthropic models in the Claude Code. So I use Codex CLI because that’s what I had access to. then basically, as soon as I could switch over to Claude Code, I did and thought that was much better. But I haven’t gone back. So I feel like I maybe owe it to OpenAI to give it a try again.

Shimin (03:18) Gotcha.

Dan (03:25) Yeah, that’s how I’ve been feeling. Cause like one of my buddies that I worked with a while back that ⁓ I really trust his opinions on code and stuff in general, is a huge fan of like five four. Like he basically dropped anthropic, like it was high. like, this is so much better. I’m like, okay. But I don’t really know his sort of like AI chops as it were. So just like, it’s not, his opinion carries enough weight that I really should check it out. just haven’t gotten to it.

Shimin (03:50) ⁓ the vibe on the inner webs is that five five is a four seven level model that it’s seriously worth considering switching over or at least trying it to compare.

Dan (03:51) Yeah.

depending on your opinions on 47.

Nathan Lubchenco (04:05) Yeah, I guess I’ll just contextualize, like, my perspective there is I basically think we might be going through some sort of shift for like what models are the best for like autonomous work is are not necessarily the models that are best for like collaborative work. And I think if you’ve spent a lot of time with sort of in like the Centaur mode of like highly collaborative approaches, then you might not be sort of well prepared or disposed to know how to like use something that is going to be more like autonomously focused. And so I think that could almost just be like the skill issue that like I’m struggling with at least.

Dan (04:16) Mm-hmm.

Nathan Lubchenco (04:33) with a sort of transition in that way? Because I assume that it’s in fact better. I just don’t know how to use it as well. And I’m not sure if I’m like, I’m not sure if I’m like, if it’s worth investing if like, Opus five is going to come out in three months or something. Like, what, what, what point do we sort of like have too much like churn of like learning how to adapt to new models?

Dan (04:38) You

Yeah, you would think that they wouldn’t.

Shimin (04:49) Mm-hmm.

Yeah, one of the things that was kind of fascinating was OpenAI put out a press release saying, or maybe not a press release, but they tweeted that users should adjust their prompts to better work with 5.5. That 5.5 was sufficiently different from 5.4. You probably want to rethink your workflow. And I’m not sure we have, or the average developer will have the time, to be honest, to go through that every time a new model is released, every third month or so.

Dan (05:16) I have like a pretty janky setup right now where I control the model through like a series of scripts and environment variables and things. And I had manually switched it over to 4.7 and ran it for like a day and then wound up eventually at some point I closed that window and lost the, you know, sort of inline changes that I’d made to my setup, relaunched it and was like, I realized today that I’ve been on 4.6 again all week and I’m like,

You know, I’m kind of OK with that. So I was getting a lot of work done. don’t know.

Yeah, so then the second piece of news we’ve got is DeepSeek has come out with their new v4 model. So if you recall, DeepSeek kind of shocked everyone with their initial offering and it goes like what v3 or something like that came out there where like it was almost as good as the foundation models and was significantly cheaper to actually operate and also much cheaper for them to train. So

been keeping an eye on them ever since. they are now claiming that they’ve essentially caught up to Claude, is that their rather large claim is. They certainly have caught up in terms of context length, like they’ve hit one, they’re able to support one million with their hosted version of it. But the other reason why I always like paying attention to them is because at least in the past, they’ve done open weight distills of this for smaller models like Quinn.

And so it’s kind of neat to be able to just like try it out on your own hardware, which as I’m now famously quoted on LinkedIn, you know, control the means of token production. So there you are folks.

Shimin (06:48) Yeah, I made the base model at 1.6 trillion parameters and 49 billion active parameters. I need a very nice bonus to be able to run that at home.

Dan (06:53) I mean, you can’t run that?

Shemin is GPU

rich compared to all the rest of us, you know, he should be able to run it. Well, he was single, I thought he doubles. Okay. Nevermind then.

Shimin (07:03) Yeah, with a single 4090, right.

single. If I had doubles I’d be doing

more fine-tuning experiments at home. Yeah, exciting. The vibes on V4 seems to be not earth-shattering unlike the original reasoning model but I think we’re also expecting a reasoning version of V4 coming shortly. So we’ll see how that shakes out.

Dan (07:24) Yeah, which

they’ve really been talking up and at least in the press releases too. So we’ll see.

Shimin (07:28) Yeah. All right. The other news item that I think is warrant discussing is Meta came out last week with an internal memo stating that they will start tracking US based employees keystrokes and sometimes screenshots. And this is our writers reporting here.

Dan (07:48) And, and mouse

movements too. Sorry. Keep it together.

Shimin (07:50) and mouse movements. Yes, this is the model capability initiative.

It’s supposedly only running on work-related apps and definitely will not be used for performance assessment or any other purposes. ⁓

Dan (08:03) Definitely.

Nathan Lubchenco (08:05) I have some swamp to sell you.

Dan (08:07) Yeah. What’s funny is I was just listening to one of our old episodes last night while I was painting some stuff. one of the things that we talked about was the Zoom feature that they rolled out where it like measures out your performance in Zoom meetings, like longish spiel and stuff like that. it’s already here folks. Here we are with this version of it too.

Now it’s everything.

Shimin (08:29) Yeah. So part of me wonders, you know, this kind of employee tracking is somewhat common in the blue collar space, right? Like you hear horror stories from Amazon warehouses and call centers.

Dan (08:43) Yeah, or even like call center folks, they’re on, you know, these days, these days call

center doesn’t mean you’re in a cubicle somewhere. means you’re on a laptop at home, like logged into the, the SIP thing or whatever that’s running the call center. And then you’ve got a script up and.

Shimin (08:58) So

speaking of seizing the means of token production, ⁓ is this just the kind of worker control suppression coming for the white collar employees, at least in Meta’s case?

Dan (09:03) You

Okay, so hot take. I actually kind of think this is a good idea.

Shimin (09:15) That’s hot.

Dan (09:15) Would I want to work there? No.

Nathan Lubchenco (09:18) That is hot.

Dan (09:19) After

this. But here’s why, right? So how did LLMs get so good? Or how did they start? We had, you know, read the entire damn internet in every book ever written. Okay, well now all of sudden you understand language. Well, how are you going to get good at using a computer? It’s not by reading more English text, right?

Nathan Lubchenco (09:20) That blazing hot.

data.

Dan (09:38) by getting an enormous amount of data about computer usage and how can they get that? So this is like, to me, it’s one of the only feasible routes they have to get that. And probably one of the most palatable because like, imagine if Microsoft did this or something and they’re like, yeah, we’re gonna watch how you use your Windows machine that you own to train our desktop use model.

Shimin (10:00) Mm-hmm.

Dan (10:00) So I don’t know. mean, do what I want this running on my machine. No, but like I can.

Rahul Yadav (10:01) This is also

But do we want this

on MetaEmployees machine? Yes.

Shimin (10:09) You

Dan (10:10) Can use it in future court cases about.

Rahul Yadav (10:12) They also were not, we don’t have that on the list today, but then the Chinese government recently blocked the Manus acquisition too. And the biggest reason they acquired that was the computer use capability they had, or one of the big reasons. So somewhat related and maybe they were building to that anyways.

Dan (10:21) Yeah, I did see that.

Shimin (10:21) Yes.

Dan (10:32) Yeah, it could be. And maybe that’s actually pushing this. Like, I’m sure the timing of that announcement, they knew it at the time that this was like, you know, being talked about internally. And then the memo just got leaked somehow.

Shimin (10:36) Yeah.

Rahul Yadav (10:43) Yeah.

Nathan Lubchenco (10:43) Yeah.

You also could just like pay people to use computers specifically for this purpose and like build some like RL environments and like, you know, given that, you know, Meta wasted 81 billion on the Metaverse, like they could afford to pay some people to use some computers.

Rahul Yadav (10:52) Bring Mech Turk back.

Yeah.

Dan (10:58) You would think so, but you know, they are paying people to use computers, so why not do both?

Nathan Lubchenco (11:01) Yeah.

Shimin (11:03) Yeah, why pay for it when you can get it for free?

Dan (11:03) Yeah, also

mark this in history that I’m defending that. I think that’s a first for me. I know. Well, Nathan’s here and he’s the king of hot takes, so he just got me going on the hot take.

Shimin (11:07) That was-

Nathan Lubchenco (11:08) I’m

surprised.

Shimin (11:09) Sizzling take. What’s in the hop water today?

Rahul Yadav (11:15) And then they’re also planning to like lay off 10 % of their people, it said, next month. Someone would be furiously clicking buttons in hopes that they’re not at the bottom of the leaderboard.

Shimin (11:22) Mm-hmm. Yep.

Hahaha

Dan (11:27) You

Click the like button 390,000 times and keep your job.

Shimin (11:36) If employees are going to start get paid in terms of tokens, it’s only fear that they also have to pay the company via user data. All right, guys.

Rahul Yadav (11:46) Hehehehehe

Shimin (11:48) Alright, next up, we’ve got…

an article about coding models are doing too much. think this is from Dan again.

Dan (11:55) Yeah, so

Yeah. there’s a pretty interesting post that would made it to hacker news that I found, ⁓ entitled coding models are doing too much. and I hadn’t really thought about this in detail cause I’ve mostly just sort of been, you know, cranking on agent to coding like everybody else and, watching the results, but they did a pretty interesting.

Experiment where in order to prove that models are essentially what they’re calling over editing They measured the levenstein distance of the code modified levenstein distance so they actually like I think they’re using ⁓ Python tokens as the The like element to calculate the distance from so for example if you rename one function from like a to be

Err, that’s a bad example. Like, A2, extremely long name with lots of cheese. The distance would still be 1 because it’s really just like a rename. So I thought that was a clever way to sort of like normalize for like naming differences or stuff like that. So in any case, they calculated this token level Levenstein difference on a bunch of different scenarios.

in code and they wanted to see how much additional editing the model does for things that don’t necessarily need to be rewritten given a certain prompt. So the results for it were not super surprising to me because they kind of track highly with what I think are good models. But it was pretty interesting to see.

that overall even some of the best models are doing what they call over-editing, right? So it’s making more changes than are strictly necessary to address what the prompt does. the overall, I believe, winner in terms of the lowest extraneous edits was ⁓ Opus 46. And they tested GPT-54, 46, Gemini 3.1 Pro.

GLM5, QWEN3, Kimmy, DeepSeek R1, DeepSeek Chat 3, and GPT5. And the results varied from almost one down to pretty low sub-decimal distances. The other thing that they checked out was does reasoning have an impact on over-editing? And it seems like it…

kind of bears out that it does, but it’s not like super substantial.

Shimin (14:10) Yeah. And I just want to add the way they ⁓ created the test data for this experiment was they took, big code branch and they were flipping booleans. And so it’s all synthetic data, which, which is a pretty clever way of going about it. So the minimum amount to

needed to pass is just by flipping the trues back into false or vice versa. So it’s a great way to generate lots of data for this experiment. They did find that GPT-54 overedits the most. I do find that in general,

Dan (14:46) Did I read the

distance backwards?

because I know the bold ones are the best.

Shimin (14:50) You

Dan (14:51) Yeah, I read it right. They just have it sorted really weirdly in that chart. Okay,

Shimin (14:54) Yeah, they had it sold out weirdly. GPT-54 had

0.395. So that’s higher than Opus at 06. And I do find that over editing happens a lot, especially if I don’t include something in my prompt to only change the minimum amount necessary. It would just remove my comments. I hate that when it does that, or rewrite entire functions.

Dan (14:59) Yeah, which is pretty terrible. Yeah.

Shimin (15:17) so this is a very necessary experiment in my opinion. And, ⁓ I do also want to add that they also calculated the cognitive complexity of the edited functions. So if you have multiple nested loops, then the cognitive complexity to understand the code goes, goes, goes up. Right. So, so if the over edited code has more cognitive

complexity that not only is it doing more than it’s asked for, it’s making the code base actively worse. So I thought that was interesting as well. Like the fact that, you know, the agents are making the code base worse.

Dan (15:54) Yeah, although like to some degree, yeah, I was gonna say to some degree, does that matter?

Nathan Lubchenco (15:55) for humans to read. ⁓ And so I guess I

find this is very likely a very temporary problem since we’re still in augmentation era and still reviewing code. But I guess I could imagine a world very soon in which we are not reviewing code very much, or we might even want code to be optimized to be read by models and not by humans.

Shimin (16:03) Mm-hmm.

Nathan Lubchenco (16:16) So it’s like, worse for whom? Like, it might not actually be worse for LLMs to read.

Shimin (16:21) Right. So this is actually, if I have more free time, I will run this experiment, which is like, does human readability also equate to AI readability? My heart says yes, but that could be.

Nathan Lubchenco (16:34) It didn’t used to,

and at least in my experience, because basically, think earlier versions of the models really struggled with abstraction. so violating, like do not repeat yourself, made things actually notably better for the models to copy paste and just have everything inlined. I think this has changed where it’s not nearly as true as it used to be, but at least at some point in time, I had strong opinions that like,

model readability was different than human readability, but to your point as the models get better, they might be much more closely correlated.

Shimin (17:04) Right. And, but don’t repeat yourself comes with a trade off of context length, right? Now they need to look at more files potentially to, to get the same amount of data. So, and on the other hand, I do think since so much of, the training data is minified code, maybe they are very good at it. I don’t know. This is like an interesting open research question, I think.

Nathan Lubchenco (17:25) Yeah.

Yeah, I agree. But I think we should be more more interested in this because I think the more code that is written, I basically think our coding standards of what constitutes a good code, I think if that idea doesn’t change at all, it’s very likely that we’ve missed some opportunity.

Shimin (17:41) Right. And the last conclusion they drew was that reasoning models, even though they can, they’re prone to over edit, they found, especially in the case of GPT-54, by telling the model not to over edit, it follows the instruction very closely. So if that is an issue, listeners, add it to your agent’s markdown.

to not over edit. I’m actually going to do that now that I’ve read this article. Yeah.

Dan (18:07) And they also

did some additional techniques to try to tune open weight models, I think, to like RL and a couple other pieces to remove over editing, which I’ll let you guys discuss that because it rapidly gets outside of my

Shimin (18:22) Okay, let’s move on to our next post processing then. This one is some brought to us by Rahul called The Task Is Not The Job. Great post by the way.

Dan (18:29) The ticket is not to

Rahul Yadav (18:30) The task

Dan (18:30) work.

Rahul Yadav (18:31) is not the job. Sorry, what’d you say then? The ticket is not the work. The task is not the job. By Luis Garicano from Silicon Content on Substack. This is continent, sorry. ⁓

Dan (18:34) The ticket is not the work.

continent.

Mm-hmm.

Rahul Yadav (18:51) The thing that they’re talking about is we measure how much AI can accomplish in terms of each task it does, and then you have the matter and all those graphs and everything. But their jobs are more than just a bunch of tasks. And even if an AI can automate, say, specific task or even a few specific different tasks, it misses the interconnectedness of things. And that’s where a lot of human values

comes from is that we redo tasks as bundles instead of this like I did task A completely in isolation then I can do task B, task C so on and so forth and they distinguish between having different types of bundles so if you have like you know travel booking which is you can then now book your own flights people have moved to luxury travel planning where you know some human would

you do all of that. And then we’ve talked in the past about how if negotiation becomes a in a real estate case becomes an AI thing, then the real estate manager’s job is to have those human things are building relationships and all that, that would feed into the negotiation. One

pretty like useful concept that they talk about is this concept of residual rights

And the concept of residual rights is you have the authority to decide matters that contracts and processes have not specified in advance. So some examples they give is like say construction contract says project must be delivered by March, but it does not say what happens when the electrician and the plumber both need access to the same wall on the same day. So there’s a lot of these.

you know, when things were wrong and there is no written process who actually takes care of this, who has the authority, who makes the decision, who gets blamed if the decision goes wrong, all of those things. These are not tasks, but these are more like institutional features that we have. And it’s not an easy thing to translate these to an AI world. Slightly related to this,

but not directly tied to this article is I was reading about how.

Creating like the super high friction is $160 billion industry. So think about how easy it is to sign up for Comcast internet. You can get it within the same day, but try canceling one. You’d lose your mind. And the same thing with dealing with like, know, flight cancellation when that happens, dealing with health insurance. We all know like all sorts of these cases where onboarding onto it is easy, but offboarding is

Dan (21:15) You

Shimin (21:16) Ha

Rahul Yadav (21:28) very

hard. And they’ve made that into such a big thing that it’s like $160 billion industry now. ⁓

like how much money people make off of that. And so as AI becomes even more embedded in these different processes, that problem likely will become even worse. Because then you can move your humans even farther away from, instead of talking to human to cancel your Comcast account or anything, you’ll have to talk to an AI until you try and get it to give up at some point.

waste you know tens of hours of your time or hundreds before you can actually get it to do something.

Problems like that are probably going to get worse. this is another form of that residual decision right, where that’s slowly like people are moving it farther and farther away from the end user in terms of, you know, when you want to cancel your service. And I think that is bound to get worse until courts have to step in and kind of set some precedent that you can’t do this.

Shimin (22:30) Yeah, so we should be selling prompt packs about how to hack the customer service agents to do the thing we want them to do. That’s what I’m hearing.

Rahul Yadav (22:36) Yeah.

Dan (22:39) Ignore previous instructions. The car will be free.

Rahul Yadav (22:40) Actually there was…

Perplexity was trying to, you know, perplexity has that computer use feature. So people were using that to buy stuff on Amazon or something. And then they were even spoofing the user agent string and Amazon detected if they sued perplexity. And in March, there was a decision by the court that perplexity was in the wrong because they didn’t really disclose it. So that was the closest precedent we have to this in terms of,

Dan (22:48) Mm-hmm.

Rahul Yadav (23:12) what your agents would be able to do versus their agents. But if you look at like the trend from the past, ⁓ companies use these agents to waste a lot of our time, but we cannot do that as end users and just be like, hey, I would like to spend these 10 hours or something other than waiting on the call so that I can talk to some person who might still tell me sorry, I can’t do anything about it. those are the kind of problems that are about to get worse.

Dan (23:36) But now we could build our own agent to talk to them.

Rahul Yadav (23:39) But you cannot, you might

not be able to do it is my point. I think you might have to be a limited power of attorney or something of like what it can do on your behalf. Yeah.

Shimin (23:43) Yeah, cuz…

Dan (23:49) Whoa.

Shimin (23:50) It will be in the terms of

service that you cannot communicate with the company via an agent. The terms of service that we never read. Yeah.

Rahul Yadav (23:54) Yeah. Yeah.

Dan (23:54) But isn’t

that already happening though? Like Google had wrote that crazy thing that would like get restaurant reservations for you or whatever through the like just tapping on their phone. Like it would call. I don’t think so, but I mean like you didn’t have to give it power of attorney to get reservations for you.

Rahul Yadav (24:08) But it didn’t really work out, it? Yeah.

Shimin (24:09) Yes.

had a buddy…

Rahul Yadav (24:17) It’s Google. If I had Google Money, Dan, you know, I’d

be doing this left and right.

Dan (24:21) You

Shimin (24:23) Yeah, I had a buddy do that the other day for our dinner reservation. He had his open claw agent called the restaurant for reservation and get it all through. So, you know, they’re not catching up just yet. Um, it went through, like we got a reservation. Yeah. Yeah. It was interesting. Um,

Dan (24:34) Did you, did you get, I didn’t go through, is that what you said? Oh, oh, wow. That’s pretty

crazy.

Rahul Yadav (24:43) Yeah, sorry, I went ⁓ on a tangent there. just to wrap the article up, the conclusion is still, you know, we’re going to have more human related job, relationships related jobs. So then start polishing those social skills that you don’t have. Because I think that’s what we’re all supposed to do once AI takes over these jobs.

Dan (24:59) ⁓ dang.

Shimin (25:09) Yeah, I think the

money quote here for me personally is this quote halfway through the article. Human agents are still indispensable on the two dimensions that matter the most. Recognizing what the customer is not explicitly saying and avoiding the kind of interaction that ends up on social media. And this is regarding a customer Chinese customer service form that does a lot of customer service. Right. So, you know, the centaur model is

Rahul Yadav (25:27) Yeah.

Shimin (25:35) still correct, we probably still need to have human in the loop to fill the gaps in the AI’s jagged edges.

Dan (25:43) The other kind of funny reverse problem that I think we’re having with this that’s already happening is, my car had a recall on it. And so I contacted the, you to go to their dealership for recall repairs. So I contacted the dealership and someone texts me within like probably three minutes. And the first thing that they said in there is like, I am a real person, not an AI agent. Like, please text me back. And I was like, okay.

Shimin (26:05) Ha

Dan (26:09) And so I did and it was a real person, like, I just thought that was hilarious that like, clearly they’d gotten enough people, like, I guess just due to the way their system is set up where they just like sent you to a person so quickly thinking that it was probably an agent. I don’t know. Just kind of fun.

Rahul Yadav (26:19) Yeah.

Shimin (26:20) Mm-hmm.

That’s like the clear

second order effect when communication becomes free.

Dan (26:29) Yeah.

I’m real, promise.

Nathan Lubchenco (26:31) Yeah, I’ll just add real quickly that

I think like the bundling model is like super valuable to think about for sort of what like comes next. But my concern is it’s going to break down when if AI gets strong enough to do more and more of the bundles. So think it’s going to be like explanatory in the short term and not quite as protective as people might think or want it to be in the long term.

Dan (26:51) And that’s, isn’t that why they’re scared about releasing mythos was the fact that it wasn’t that it was particularly good at cybersecurity is that it had the chaining ability, right? To be able to drop like 32 exploits at once or something that you needed to be able to crack something. like, isn’t that effectively this bundling?

Rahul Yadav (27:05) Huh.

Nathan Lubchenco (27:06) I think part about it, I think

for the UK security range plan, it was the first model that it succeeded in completing all the steps for planning the cyber attack. I do think the 5.5 did pass one out of 10 though, and Mythos was only three out of 10. And I think once you sort of get like, it’s one of those interesting threshold capabilities where one out of 10 is maybe enough to still cause significant issues. But yeah, in terms of I think for the exploit creation, I do think it was chaining many exploits together in order to make the

Dan (27:35) Right.

Nathan Lubchenco (27:35) vulnerability.

Dan (27:36) So maybe that’s like simplistic, but I see that as like it required researching and understanding and things and then putting them all together, which is like, I don’t know if that’s I would define a bundle, I guess. I don’t know. Maybe not.

Rahul Yadav (27:49) And part of this, think if you to successfully accomplish the bundling thing, let’s say, you know, Reddit had had closed off its APIs and stuff, let’s say, and then you’ve seen, or I have seen at least, Jira has made their APIs more restrictive, I think a bunch of other companies are doing it too. I can

see a future where the products that you use don’t have like very restrictive APIs and all that so that you can like it almost limits how much you can do with it and if you want to truly do a job end-to-end using agents they might run into those problems and then maybe you just get a solution like Google Workspace that has all that it has today but all the other things too or like you know Microsoft or whatever and then within it

doesn’t really need to reach out to anything else because every time you have to reach out to some external party if they don’t really play along then you’re also limited there and then you maybe need some human intervention in these things so there’s all sorts of like those weird things that we humans bridge today that are not really you know easily you can’t really easily accomplish them ⁓ if that world comes to comes to be

Dan (29:07) Yeah. Although the downside

is if Microsoft does that, their agent is unfortunately just for entertainment purposes only. regardless of how good the ecosystem is playing in, ⁓ sorry, that’ll never get old.

Rahul Yadav (29:14) entertainment purposes only.

Shimin (29:16) You

Rahul Yadav (29:21) Yeah.

Shimin (29:22) ⁓

one last go ahead

Rahul Yadav (29:24) But between Google

Stitch and Gemini and all the coding agents and everything, companies building their own internal tools and all that, it could be that you end up having fewer things because of these. You can reduce all these coordination costs and everything. That then can unlock that whole, you can do the whole bundle instead of just individual tasks.

Shimin (29:50) Yeah, I went to a AI go to market meetup last week locally here in Seattle. And folks were talking about AI agents that they’re building to do end to end marketing tasks for them. And one particular guy was mentioning how he had a ⁓ promotion marketing agent for his free Apple apps. And he had the agent

did what it was supposed to do, which was trying to get free newsletter advertisement for his app. But it was being so insistent that he had to then go back and write an apology to the newsletter author saying, you know, it was my agent. I’m sorry. It was being kind of a dick about it. So it’s not quite there yet, but folks are trying to eat the whole bundle.

Dan (30:28) Ha

Rahul Yadav (30:34) Yeah.

Shimin (30:36) Let’s talk about the 2026 AI index report. This is from Stanford University’s HCI, Human Centered Artificial Intelligence Group. It is a 423 page PDF. So unfortunately, I did not spend the entire weekend parsing through every single bloody page of this thing. But I did…

Dan (30:58) Are you sure he didn’t read

Rahul Yadav (30:58) You said 423

Dan (31:00) all of them?

Rahul Yadav (31:01) pages?

Dan (31:01) Yeah.

Shimin (31:02) It’s

real long, but there are some really interesting Findings from this report that I want to just bring up to everyone’s attention Both from the outline that I’m showing on the screen right now for the web version and also from the actual PDF itself so Let’s see the first one that I thought was really interesting was or surprising I should say is that

While the US has been leading in AI investment, its ability to attract global talent is declining. And here the money quote was, number of AI researchers and developers moving to the US has dropped 89 % since 2017, with an 80 % decline in the last year alone. I wonder why that would be? Question mark, question mark. Very, very surprising.

Dan (31:43) Hmm.

Shimin (31:47) Most surprising factor here is generative AI had reached 53 % population adoption within three years, faster than PC or the internet. But the United States ranked 24th at 28.3 % adoption. So the US, unlike most of our previous technological revolutions, is actually a slow adopter. And I think the United States probably has one of the strongest anti-AI vocal groups out of any countries in the world.

And what about for developers? Here too, there are some surprising notes.

AI engineering skills are accelerating fastest in the UAE, Chile, and South America. So this is not something that US developers have a world leading edge on, which I thought was really, really shocking. And lastly,

Dan (32:35) Well, the keyword there’s acceleration

too, right? Like I would argue that that’s what the derivative of velocity, right? So if you’re already moving pretty fast acceleration, it’s like you got a sprint to catch up. Anyway, that might just, that just might be my ⁓ highly Americanized view of things, but.

Shimin (32:42) that’s true. That is true.

Yeah, this is the part where I didn’t dig into the…

This is the part where I didn’t dig into the research methodology and find out if acceleration is actually a second order ⁓ number here. Really bites me. Okay, a few more.

Dan (32:59) Yeah.

See, should have read all 433 pages. Do even call yourself a

podcast host? I mean, come on.

Shimin (33:07) When it comes to how people do their jobs, 73 % of experts expect a positive impact compared to just 23 % of the public. There is a 50 point gap. Among the surveyed countries, the United States reported the lowest level of trust

Dan (33:24) Ha

That’s pretty wild.

Shimin (33:32) in its government to regulate AI at 31%. The EU has the highest ranking to perhaps nobody’s surprise, but they tend to over-regulate from time to time.

Dan (33:40) Mm-hmm.

Rahul Yadav (33:40) Can.

I was watching Dylan Patel of semi analysis went on whatever that Patrick O’Shaughnessy podcast is I don’t remember the name. And he was talking about how outside in the like, the general population doesn’t know who Dario and Sam are doesn’t know entropic and open AI and all that. But when they see that

these people are going on podcasts and all that and claiming your jobs are going to go away within, you know, two, three years or whatever, AGI will be here and it will have almost everything that they hear are negative things about what’s going to happen for their life, their jobs and all of that.

The reaction you got from the common public about the attacks on Sam Altman’s house, that’s what you get. Because he was calling out the comments from people were more like cheering the people on, which is not what we want. that’s the, yeah. And so this matches pretty well with that, where the public is just not as on board.

Dan (34:35) Yeah, the Instagram comments or whatever. Yeah, I saw that.

Shimin (34:37) Mm-hmm.

Rahul Yadav (34:46) as we would want them to be and for these reasons.

Shimin (34:51) Yeah, we didn’t mention that attack on the podcast here, I will say I don’t say a ton of really nice things about Sam Altman on this podcast, but I was really surprised at how modest his residence was in San Francisco. thought someone with that much money will have a compound summer in the Bay Area. And I saw the picture on the house on CNBC. was like, ⁓ he’s like one of us. ⁓

Dan (35:05) That’s your takeaway from it.

Rahul Yadav (35:06) you

Dan (35:14) Do know how much

houses cost there? Turns out that house is actually, what, 4.7 million?

Rahul Yadav (35:14) He doesn’t have that much money. It’s all in options and all that.

Nathan Lubchenco (35:19) I’ll just say it obligatory,

political violence is never the answer.

Dan (35:22) Yeah, thank you. That’s where I thought you were going with that, Shimin

Rahul Yadav (35:23) Agreed.

Shimin (35:23) Right. Thank you.

Rahul Yadav (35:25) use your words when you disagree use your words. Yeah

Shimin (35:26) I mean, of course, you don’t have to say that. ⁓ Somebody needed to. And our guest. OK. A few more facts further buried deep in the PDF that I want to also mention. They found that the AI replacing entry level developers or entry level employees first thesis is starting to show up in the data.

Dan (35:29) I mean, maybe we do these days. Yeah.

Nathan Lubchenco (35:29) feel like I needed to.

Shimin (35:48) They found that the internet tipped past the 50 % AI generated content somewhere in January 2025. I mean, it was something that we’ve long suspected to be true, but it’s in the data now. further proof that America is not embracing AI the way the rest of the world is. Workplace AI usage is higher in several emerging economies than in many advanced ones In 2025,

58 % of employees globally reported using AI at work on a semi-regular or regular basis. But in India, China, Nigeria, and UAE and Saudi Arabia, the share exceeded 80%. So I think there is this weird global divergence where the developing countries are really, really embracing AI, whereas the developed nations are more like, let’s hit the brakes. Let’s slow down. Let’s think about what the ramifications are.

I’m glad that’s showing up in the data as well.

Dan (36:42) Well, people did stand in line in China to have OpenClaw installed on a machine for them, which is pretty wild.

Shimin (36:51) That is pretty wild.

Nathan Lubchenco (36:52) If you talked at all in the podcast before about sort of like the like unintentional form of like misinformation from like how like when people finally do like, know publish like academic studies They’re like still using like GPT-40 mini or something and how they’re just like so much of like

public perception is shaped by these well-intentioned places where people maybe reasonably got good data from before, but it’s almost impossible to, because by the time something has actually been peer-reviewed even, it’s already pretty stale. And I think that’s one of the other things that contributes to some of the misunderstanding of the public. And then not to mention the whole data center water usage. I’ll make a plug for Andy Maisley’s blog. He’s done more to try to correct

Shimin (37:08) Mm-hmm.

Nathan Lubchenco (37:34) information about like a data center water usage than anybody so check that out if you’re curious about that ⁓ it’s not that bad

Shimin (37:40) You’re telling me my tokens are not actually causing drought in Colorado right now.

Dan (37:41) Yeah.

Nathan Lubchenco (37:46) They’re not.

They’re not. Your tokens are not the problem. I feel quite confident.

Shimin (37:49) Let’s try it.

Okay. Well, let’s move on to our interview with Nathan. Okay, go ahead.

Rahul Yadav (37:57) Wait, can I?

This thing didn’t need the first 10 pages and the last like 30 of them. So that’s all. If you scroll all the way to the front, the top of this, they’re just big nothings. That’s how you get a 423 page report.

Dan (37:59) WEEKS!

Shimin (38:08) Yes. Yes.

Dan (38:14) Ignor-things.

Shimin (38:17) I know that’s why I skimmed it and brought you guys the most interesting factoids. Cause a lot of it is like more or less like, yeah, we know this, but this is not for every, it’s not just for the AI developer kind of niche, right? This is not for everybody.

Rahul Yadav (38:21) Yeah.

Yeah.

Nathan Lubchenco (38:30) Yeah, I I still think it’s like very, I think a lot of people still believe that AI is hitting a wall. Like, I think you’re just going to like, you know, people want to believe what like Gary Marcus has to say, and they just want to believe that like, capabilities are like not going to improve. actually, even though like the top line, like AI capabilities, not plateauing is like super boring and obvious, it’s like really where the people understand that.

Shimin (38:50) I have a soft spot for Gary Substack, Nathan. I don’t appreciate that kind of language. I disagree with most of it, like 95 % of it, but you know, he’s fighting a good fight out there. yeah.

Nathan Lubchenco (38:54) Okay. All right, I poked the wrong thing.

Dan (38:58) You

Nathan Lubchenco (39:02) Okay, we’ll

agree to disagree probably.

Rahul Yadav (39:05) future podcast guests.

Dan (39:06) Compatible guest about that interview.

Shimin (39:09) We can have a fight about it.

All right, now let’s agree on a few things. Most importantly, the importance of philosophy in daily life. Nathan, why don’t you tell us about how your journey to becoming a AI software engineer.

Nathan Lubchenco (39:23) Sure. Yeah, basically as you mentioned, I my start in philosophy and I ended up studying a branch of philosophy called experimental philosophy, which basically sort of like asked this question of like a lot of philosophers say things like it’s intuitive that and experimental philosophers are like, huh, I wonder if people actually find that intuitive. So they began to use increasingly more like empirical methods, starting with sort of like psychological surveys and then eventually using sort of more advanced statistics to try to like understand how human behavior and sort of data can end up like informing like philosophical questions.

so that led to sort of me doing what I consider my first data science project, which was investigating lending patterns in Kiva microfinance loans to think about how people think about distributing aid. And then it turned out that academic philosophy is super hard and eventually managed to find my way into tech, at…

as a data scientist for some various startups. But then ended up being pretty constrained by needing to get the data and became more more interested in data engineering. And then also just found software engineering to be much more of a collaborative team sport. so taking data science background plus data engineering, found working as an ML engineer to be a really promising, good fit for all of those. So yeah.

Shimin (40:33) Mm-hmm.

Who’s your favorite philosopher? Don’t think, just say it.

Nathan Lubchenco (40:37) David Lewis.

Did amazing work on counterfactuals. Probably one of my favorite, like, sort of like facts of like counterfactual logic is the transitivity doesn’t hold. So if A then B, if B then C, but in counterfactual logic, if it would have been the case that A, then it would have been the case that B, transitivity breaks down and that’s just like a surprising and interesting fact.

Dan (40:38) Who’s yours, Shimin

Shimin (40:57) Gotcha.

Dan (40:58) Who’s your favorite, Shaman?

Rahul Yadav (40:58) This is too

smart for this podcast, Nathan. You’re going to have to bring it down by like 50%.

Shimin (41:02) Yeah, that’s a one.

Dan (41:02) You

Nathan Lubchenco (41:03) You can cut it, you can cut it, it’s fine.

Shimin (41:09) And since

Dan would not let me discuss this on the podcast, Nathan, are you familiar with the Claude’s soul or constitution document and that they have an anthropic has a philosopher on the payroll? How do you feel about that?

Nathan Lubchenco (41:23) I feel great. I think philosophers should be involved more in like pretty much all aspects of life. I think people who sort of like, you know, need to like think critically and deeply about like the really important issues is very valuable. There’s some, yeah, I think this is great. I think this is actually probably going to be underrated for how important this is going to be to making the future go well, actually.

Rahul Yadav (41:43) Have you been, I’m curious if you’ve been able to teach your colleagues some of these, know, tricks is not the right word, but how to think about these things philosophically and some things that you’ve learned that others are missing.

Shimin (41:43) I agree 100%.

Nathan Lubchenco (41:59) Yeah, that’s interesting question. mean, I basically, I think if I try to model the behavior of doing a lot of like rejecting the premise, trying to ask clarifying questions, how are we sort of framing things? One of the sort of most important things I have learned in my journey is even just like the idea of like, how important like framing and like loss aversion are. So if you sort of like frame, a lot of like status quo bias.

Rahul Yadav (42:07) Mmm.

Nathan Lubchenco (42:22) But I don’t know. I’m not sure if I’ve actually been very successful in any of that. But ⁓ I don’t know. I try to use those kinds of things to approach problems.

Rahul Yadav (42:31) Nice. Yeah, more, you can’t be successful if the other person’s not receptive. I was just more curious if it shows up in your everyday interactions and if other people picked it up and you know, yeah.

Nathan Lubchenco (42:41) Yeah, I don’t know. I’ll

put Dan on the spot if Dan feels like this has happened at all.

Rahul Yadav (42:45) Yeah.

Shimin (42:45) Hahaha

Dan (42:47) I never really

noticed anything like overtly. Yeah, so full disclosure, Nathan and I worked on the same team for several years. We still work for the same company. But I wouldn’t say that I recall you doing that, but I certainly enjoyed working with you and we’re working together daily. I don’t know. Maybe it bled through and I just didn’t even know.

Nathan Lubchenco (43:10) That’s a question, I’ll think about it more.

Shimin (43:10) All right.

Yeah, let’s bring things back to AI. So Nathan, you first wrote about a spec driven and development on your sub stack back in, think, June of 25 last year. How long did you take at the time? How long did you think the industry was going to take the industry to adopt spec driven development fully? And how did you feel about the rapid pace?

Nathan Lubchenco (43:24) Mm-hmm.

Shimin (43:36) that our entire industry actually has adopted in a spec driven development.

Nathan Lubchenco (43:40) Yeah, I’ve been really surprised. guess I feel like a lot of my sort of, you know, writing and thinking in 2025 was sort of feeling like things were were lagging behind. I was sort of frustrated by like how like little it felt like some people were like thinking about these kinds of things. And it really felt like, you know, the phase change that happened sort of in like maybe like November of 2025 with, you know, sort of Claude Cote and whichever opus came out around then. And I feel like I feel like a lot of people who’d previously been very skeptical.

Dan (44:05) is four or five, I think, was the step change. Yeah.

Nathan Lubchenco (44:08) really sort of shifted, so I did not anticipate how much things would change, because I thought it was definitely going to take longer to sort of get along that direction.

Shimin (44:15) Yeah, I think I was thinking like it will take one or three years and it happened in six months. Like I still look back kind of shocked. Yeah. ⁓

Nathan Lubchenco (44:20) Same. Yeah.

Yeah, and it turned out so much better

than I imagined it also, so that’s also good.

Shimin (44:28) Right. Yeah. Okay. Why don’t we do our deep dive on your, ⁓ on your sub stack post here titled, ⁓ interviewing software engineers in the age of AI, where you, you know, we all know software hiring process has been broken.

is broken and has been for probably a long time. And, especially with the help of AI, you know, we need to really shape, shape things up to, to, to keep, to keep us, to make sure that we are hiring the right, developers for our teams. So what are, I guess, you know, personally, I’ve never had to do a quick sort in a, in any company ever from scratch.

Nathan Lubchenco (45:06) Yeah, it was rough. I did not do well. ⁓

Dan (45:08) How about

balancing an AVL tree?

Shimin (45:10) yeah, all that. Yeah. And every time looking for a new job is like a six month process of let’s read all the books and let’s just, you know, crank leet code for a long time. given all that and where AI is today, what are some proposals that you have for how we should hire software engineers?

Nathan Lubchenco (45:27) Yeah, mean, think concretely, I’m very interested in the idea of moving more towards take-home tests and then potentially discussing those take-homes with people. I think…

In general, I think there’s just like this big gap between sort of like the performance that you have to put on an interview and like what people are actually capable of. And since, know, if anything, people are pairing less and less than they used to. And so our sort of like ability to sort of like use the tools and think through a problem while another person is like.

actively staring at us and like, you know, maybe being somewhat of hostile makes it just I think like so much harder to do our best work and I think sort of, you working entirely within like the tools that you’re sort of familiar with and the exact way you want to is a way to sort of understand much better what people are actually capable of. But then I think it’s a critical to sort of then do the next step is like did the person understand

at all what they’ve like given to you and can they sort of talk you through and sort of like navigate like sort of a discussion of you what the trade-offs were how they sort of went about thinking about it and all those sorts of things I think that’s like one way in which we can get like potentially a lot more signal and I think even the creation of take homes can be made a lot easier for like the interviewing devs too because I think one of the reasons like lead code questions have persisted for so long is it’s been so hard to get you know hiring engineers to invest more into sort of creating more realistic versions of things

Shimin (46:18) All right.

Nathan Lubchenco (46:46) But when you can create code bases from scratch, can then potentially have a much more realistic situation of how can someone navigate an existing code base? Did they stick to existing conventions? Did they discover bugs? Did they leave this thing better as they thought? So I think there’s just a lot of possibility in terms of just what is even possible from an interviewer perspective to get better out of people.

Shimin (47:06) Mm-hmm.

Yeah, would you, do you suppose the, you know, the interviewees personal harness should be a part of the interviewing process?

Nathan Lubchenco (47:16) I think so if possible. But it’s tricky. I mentioned this in the blog post, too, that I’m really concerned about equity here. And so I think it would be important that companies provide API keys so that people can always have time to practice with the best models if they haven’t had that opportunity, and then to use their own personal setup. Because I do think right now, my

Shimin (47:22) Mm-hmm.

Nathan Lubchenco (47:37) effectiveness with my own sort of personal like Claude code setup versus if I just even if I had Claude code but it like had none of my like skills that I’m used to using and sort of other things you know like I rely heavily on like the the Obra superpowers brainstorming skill and I almost always you know sort of like do the you know exploration go the back and forth you know make make the make the design plan make the implementation plan and I just like that’s just like what my workflow is and how I sort of understand how to at this point in time work well

Shimin (47:59) Mm-hmm.

Nathan Lubchenco (48:05) the models. And so it just feels like since that’s like the tool now, I mean the closest I can sort of think of it’s like the equivalent of like the people who sort of insisted in like interviewing people in like Google Docs, you know, or like whiteboarding as opposed to like using, you know, an IDE or something. Like you’re getting some sort of signal but like is it a signal that anybody like really cares about? So I think if possible, yes.

Shimin (48:27) Yeah, it’s speaking of superpowers. I did find I was looking at Claude Codes prompt the other day and it’s got the proposed two to three different solutions and have the user decide which one they want to pick, which I think I first saw in superpowers. That’s when it was like originally, my God, this makes me feel like I’m doing work and contributing. It has now made its way into Claude Code itself. yeah, super interesting.

Nathan Lubchenco (48:41) Mm-hmm.

Yeah, maybe those things

maybe become less and less important, but assuming you maybe have one of the things that hasn’t been adopted or something, yeah.

Shimin (48:55) Yeah. Yeah.

Dan (48:58) It’s funny how much Claude

code is gradually eating the world. Something good comes out three weeks later. It’s adopted into Claude itself. Like the harness. Yeah. Yep. Yep.

Nathan Lubchenco (49:07) Well, yeah, they turned like the Ralph Wiggum loop into just like slash loop, which, you know, just makes

sense.

Dan (49:13) Yeah, and then channels is sort of like taking a little shot at like, you know, running your bots on Telegram and stuff like that. Like the claws of the world.

Shimin (49:20) Right. And I’m glad you brought up discussing engineering trade-offs because I feel like day to day, like, you know, 60 % of value I provide as a software engineer is actually discussing trade-offs and exploring the different solutions and finding the one that actually fits the spec. So, um, yeah, it’s really good to see.

Nathan Lubchenco (49:38) Yeah, with implementation being

so cheap, you know, I feel like that’s like where like sort of like the taste and the sort of like understanding is like where like a lot of the value remains. And I feel like this is where like a lot of the value like has always been. And, you know, I think this, even though I think sort of like, you know, system design questions, you know, have like a lot of artifice and, know, are sort of like very sort of performance oriented. I do think to the extent to which they get at like sort of like discussing trade-offs and trying to understand how people think is really valuable.

Shimin (49:50) Mm-hmm.

Nathan Lubchenco (50:05) But one of my sort of general contentions is that the job is just like, know, lots of mundane decisions one after another. You know, like, I haven’t had to write any novel algorithms in my entire career and I don’t expect to. Like, you know, in terms of something of real complexity or anything. And I think it’s basically sort of, you know, it’s just like lots of like, did I expose the endpoint that updates the database correctly? Like, I don’t know, just a lot of software engineering is just very boring, right?

And so I just don’t think you optimizing for sort of like deep algorithmic wizardry is just like getting you like the wrong kind of thing. And I’d much rather have people sort of like oriented like as people who are like motivated to be on a team and you know, easy to work with and you know, thoughtful and you know, sort of. So I think a lot of these sort of like behavioral questions sort of getting at like, you know, sort of curiosity are gonna be really important. Like I think it’s like sort of shocking how many people like

that our software engineers haven’t even tried agentic coding. And I think like what’s going on with those people? Like to not be curious even a little bit at this point is just like very shocking to me.

Dan (51:09) Yeah, especially given ⁓ how much news and hype and everything else. Sorry, go ahead, Rahul

Rahul Yadav (51:10) and if you

Yeah. And

if you expect everyone to use that at your company, then that should be a filter that it’s not going to work out anyways, right? Why hire someone who might argue with you the whole time about the tools you use instead of actually doing the thing.

Nathan Lubchenco (51:35) Yeah, and I wrote this like last year when I felt like it was like much, it was not clear to me when or if like companies were going to try to begin sort of making more like mandates and that. I, even at that time, I felt very strongly that like someone who was uninterested in AI tools would be a strong no for me. Just because I just think that even if they were a great engineer today, they would not be a great engineer like six months or a year from now if they sort of refused to sort of like adapt and adopt sort of like the future.

Shimin (52:01) Right.

Dan (52:02) And

there was one of the articles we covered a couple weeks ago talking about how this is sort of like exposed a divide in the craft, right? That was probably there all along and we didn’t really know, which is like the folks that cared only about like how good the code is and enjoyed like the process of it versus the people that like care a lot about the outcome. And I think that’s like, for me, my personal struggle is like, it’s not binary, right? Like I’m somewhere on that spectrum.

Nathan Lubchenco (52:28) Hmm.

Dan (52:29) Like I enjoyed the code, but I also really like shipping value, you know? So that’s why I think I’m tending to go, but for some people it’s been very much more binary, I think. And it’s like, I hate this because I enjoy hand crafting. You know, they probably would have loved to doing punch cards back in the day or something like, I don’t know.

Nathan Lubchenco (52:47) Yeah,

yeah, I’ve heard the phrase, like, cathedral builder versus bizarre shopkeeper as one of the ways to frame that. And I’m very far on the bizarre shopkeeper. And think a lot of early adopters were. I think you had to have some amount of tolerance for code that other people wouldn’t like very much if you were using the agents a lot last year.

Shimin (52:53) Mm-hmm.

Dan (52:53) Mm-hmm.

Shimin (53:07) Yeah. speaking of, sorry, go ahead.

Dan (53:09) So now here we are this year where you can almost barely tell. I mean you can, but like it’s getting closer every minute.

Nathan Lubchenco (53:16) Well, ⁓

mean, maybe this is not a good thing to confess about my own coding ability, but I think the agents are consistently better at coding than I am. I think I get better code quality by using the agents, especially if you combine that with how my hand coding skills have degraded.

Dan (53:31) Yeah, that’s fair.

Shimin (53:31) I think we’re all a little bit in that boat.

Rahul Yadav (53:34) You’re not at meta Nathan, so I think you’re safe.

Dan (53:34) Yeah.

Nathan Lubchenco (53:34) I’m

a lot better.

Shimin (53:37) Hahaha

Nathan Lubchenco (53:38) Thanks. Thanks.

Shimin (53:39) but speaking of like the future, have a quote here that, ⁓ you worry about a longer term future where all jobs collapse into a similar feelings of managing and orchestrating AI models to do tasks that we don’t have great insight into. But you also think that the future will be pretty transitory because of how much longer would it be before the models are still better at managing and orchestration. Like, do you think we are there yet almost? And where do you think the feature leads?

Nathan Lubchenco (54:05) Yeah, I mean, think we’re still not fully there. I think I still, like when I work with the models, I think I am still adding value and I am still assessing trade-offs. But I feel less and less as each sort of model release rolls around. ⁓ And I think these sort of things that I happen to know of facts of software engineering or things about embeddings models or whatever it is, I just think in general,

the models just continue to get better at. so, yeah, I think it’s happening much faster than is comfortable for anyone. I do think that sort of, especially with the step change that mythos seem to suggest, I’ve updated my predictions. think that sort of…

maybe, you know, late 2026, which I know is very, very soon, could be the point at which, you know, I think you could get better output from a model than like a median software engineer.

And I think scaffolding matters a lot. think harness engineering plays a lot into that, having the whole framework. ⁓ When OpenAI released that harness engineering blog, the ability to take a video of the bug to show that it exists, make the fix, do the video. I think that validation aspect is a lot. But if you think about where we’re at in terms of the scaffolding, it’s extremely comparatively primitive from where we’re going to end up. mean, you think of Claude code and harness engineering as a term that someone made up a few weeks ago or something.

right? And so, but I think this is all changing very quickly. And I think the harnesses might actually matter more than like model quality at this point, in terms of like how all this happens. Because I think if you sort of like look at like what can happen when you, you know, get use of MCP servers or other sort of skills to, you know, get access to internal product docs or, you know, build systems or, you know, CI or things like that, the sort of like productivity jump you can have when you basically don’t have to leave like the terminal at all.

is just really shocking. so, yeah, I think it’s happening quite quickly.

Shimin (55:59) Yeah,

my gut agrees with you, but my heart does not. So let’s leave that.

Nathan Lubchenco (56:06) Okay, let’s go.

Dan (56:06) That’s the

tension, you know? That’s part of why we do the show.

Shimin (56:10) ⁓

Rahul Yadav (56:11) I think someone’s ability to review code is the bottleneck and I think you know how we did.

⁓ system design, traditionally you did like you write the code and then we do an architecture interview and all that. think somewhere in there actually just a reviewing code that some other agent wrote with like a little bit of context to start would be

critical to judge because that’s the current bottleneck. then anyone who’s, shouldn’t know what I’m about, like, you know, take six months to find a job or pick whatever time frame. I think since building new things is so easy.

I would want to see what they have built given the, know, like you don’t have to have a $200 Claude max or whatever, but even with a $20 tool, what were some of the ideas that someone had, what they built out of it, because that then you can use to suss out some matters of taste and quality and all these other things and how they architected it, could they explain.

they did. So there could be like some other ways to attack this problem. Even if you take away a take home test or even like live using agents, what are some other things we would want to suss out and how we go about doing that?

Nathan Lubchenco (57:32) Yeah, I think those are both great points. I 100 % agree that like the current bottleneck is code review. I think that will probably only be like next six month problem though. And so I think, yeah, if you need someone now, I think being excellent at code review is perhaps the single most value you can contribute to a team right now. I have a teammate who is just excellent at this and has like made all of my projects go so much better. ⁓

Rahul Yadav (57:38) Yeah.

you

Nathan Lubchenco (57:55) And without working with that teammate, I would say I am no more productive than I was before. But with having even one great bot in teammate who is approaching this problem in the right way, I think makes all the difference. But I think that’s transitory. And then I totally agree also on the, I never had side projects in a meaningful way. And now I have like,

10 new repos of things that I’m doing. It’s just because essentially I’d never had the energy or the motivation and now it’s so hard almost not to. So yeah, think those are both great points.

Dan (58:28) The other thing, God.

Shimin (58:28) Nathan, are you also in

Rahul Yadav (58:28) What?

Shimin (58:30) the Max20 club?

Nathan Lubchenco (58:32) I’m not, I basically sort of like I’ll spin up, I’ll go to like Claude 100 ⁓ when I’m working on a project and then I’ll like, I intentionally, despite all of this I try not to have too many projects going on because I find it actively good to not do too much and so I actually find it good to be limited in some ways for what I can do. It’s like a self-constraint.

Rahul Yadav (58:53) I’m curious why you think code review is transitory or will be solved in the next six or so months.

Dan (58:54) And the other question I was going to.

Nathan Lubchenco (59:00) Well, also think in the same way that I think the agents are better at coding than I am, I also think the agents are better at code review than a lot of engineers.

and I think maybe even better than I am. I think they consistently catch higher leverage things. My experience with code review in a lot of situations is a lot of subjective preferences and a lot of, this is how it should be for these reasons and it doesn’t really ultimately end up impacting the quality.

maybe quality in some sense, it doesn’t end up impacting the correctness that much. And so I think basically the agents take some of that away and are much more likely to find actual mission critical bugs. So that’s been my experience at least.

Rahul Yadav (59:33) Mm-hmm.

Dan (59:45) So Nathan, I had a question

for you too that’s like somewhat relevant, I think, to the interview piece, which is how do you feel about, or like where do you fall in the camp of show your prompts?

Nathan Lubchenco (59:54) Like you mean like as a way to describe like what you’ve done and why?

Dan (59:58) Well, I’ve seen, yeah, I’ve seen, um, a new sort of, I wouldn’t call it divide, but like I’m seeing two camps appear. One, one is just like, you know, you work with Claude, whatever, with, you know, some agent to harness and, and build something and then you commit it and review it as if it’s yours. And then the other one is do that, but with like a hundred percent transparency. And then in some

In some cases, I’ll see just the initial prompt included in the pull request header or something like that, or not the header, but the body. Or in the other case, I’ve actually seen people put the entire transcript of the conversation they had with the agent in there. So I was wondering if either of those are attractive to you and or how you think that falls into that, especially through the lens of interviewing too. Do you think that’s actually useful, the interview tool?

Nathan Lubchenco (1:00:48) I think it could be. I I mentioned this briefly in the post, how sort of including that sort of session history I think could be helpful. my main concern is that I think we’re all gonna be so poorly calibrated because I think we might see how someone else is using it and it’s hard to sort of know whether or not that has ever worked for them or whether it worked for them now. And so I would be really curious to sort of try to do that exercise, to try to look at some people who I thought I respected

thought were good at doing this and people who maybe even self-disclose as bad doing this and to try to sort of evaluate if you can like tell the difference because I do think like in terms of like one of the most important things you can do right now was like when to push back when to when to tell a model like I don’t think that’s right or like you know this this seems like you might be overlooking these things and I think like getting signal of like does a person ever do that

Dan (1:01:19) Me?

Nathan Lubchenco (1:01:40) because I think at this point that’s still very much required to get better results. So yeah, I’m less interested in that for a co-worker, but I think from an interviewer perspective, I think it’d be a really valuable source of data.

Shimin (1:01:51) All right. Well, if you enjoy that conversation, you can always find more of Nathan’s work. Uh, the future was yesterday. Substack. Cool. Um, it’ll be initial notes. All right. Dan, uh, you want to rant about speaking of getting max 20 accounts.

Dan (1:02:01) link in the show notes. yeah.

Max 20. What do mean by max 20? Isn’t that just the pro? 20 is the pro account for.

Shimin (1:02:16) The, the, the 20, 20 X max account. That’s the 200. Yeah.

Dan (1:02:19) 20x max.

Okay. I thought you meant like monetary cost. Yeah. So we’ve hinted at this a little bit last week, that, there were some rumors floating around that anthropic was, going to remove Claude code usage from the pro plan, which is the,

lowly $20 a month plan, not the hundred or the 200. So five X or 20 X, guess. And I got pretty pissed off along with a lot of other people. And the discourse I thought was kind of interesting because there was some people that are like, basically like, do you even Claude bro? You’re only using the 20. And I’m like, look, the 20 is my free time thing.

I do actually really kind of agree with what you said, Nathan, which is like, I don’t need to spend all of my time, like eight hours a day, you know, plus at work and then an additional what, like four to five talking to Claude or, you know, some agent. So 20 is fine for me. Thank you. But what I would like is the ability to use Claude code for that 20. And I’ve depended on that for the better part of, I don’t know, a couple of years now at this point, it feels like maybe it’s only been a year, but, um,

So I was pretty furious when I saw that they were considering taking that away. And I’m just like, how, what were they thinking in terms of the…

Like the way that they were going to like, you know, have any kind of incoming funnel for engineers, right? Cause the $20 plan is essentially the, feel like the lowest possible amount that you could spend on this and like reasonably get some value out of it with the way that like current models kind of blow through their resource caps. and so to me, that’s like the perfect way to get started. Like I’ve

bought the equivalent plans on Gemini and OpenAI just to play around with them, right? And that allows me to play around with their coding agents and everything else as well, which is like how I decided like, hey, maybe Gemini CLI is not really for me. so like, what were they thinking with taking that away? Like, what would their funnel be, right? You’re just like, oh, okay, I used it at work and I jumped straight into a $200 personal plan or like, I don’t get it. So I’m sure that…

Based on the discourse, a lot of people were talking about this had more to do with like the open-claw style, like harnesses that were hooking into that to essentially abuse the subscriptions. But like, people doing that with the $20 plan too? So much that…

I don’t know. Anyway, don’t do it entropic. That’s all I have to say about that. Thanks for listening.

Mm-hmm.

Shimin (1:04:49) Don’t mess it up, Anthropic. ⁓

Rahul Yadav (1:04:52) I think they

Shimin (1:04:52) Yeah.

Rahul Yadav (1:04:52) will then, but they can’t subsidize it forever.

Dan (1:04:53) I know, money always wins if I’ve learned anything.

Sure.

Shimin (1:04:59) Alright, and now onto Rahul’s rampage where he is going to review a book

Rahul Yadav (1:05:05) yeah.

Dan (1:05:05) Kind

of rampaged at that.

Shimin (1:05:06) Yeah, a literary rampage.

Dan (1:05:09) I like it.

Rahul Yadav (1:05:10) My literary rampage. Sebastian Malaby, he wrote the Power Law, a VC book of a few years ago, wrote this book called The Infinity Machine. It’s very much focused on Demis Hassabis and his journey from, you know, being a

we four year old chess player who sits on books to like, you know, compete with other people to then, you know, start a deep mind and all of that. So you’ve seen for anyone who’s seen the the thinking game and the one before that I think they had made one about AlphaGo as well.

A lot of that information is already out there. Some of the things that…

were really interesting from the book where this constant tension between Dem is wanting to do science and once he sells the company to Google operating inside Google, which to Google, AI is the biggest threat to search and it’s very much a protect the business problem. They even spend at some point a whole year trying to not be under Google by spinning up a separate entity’s waste a lot.

of their time. They’re flying from UK to like Mountain View every week or other week or something just to talk in person. And I’m like, have you heard of Zoom calls or Google meets? Maybe, you know, see if that does something. And they can’t get to any, you know, same resolution there. And then chat GPT happens.

And that all of a sudden catches them a little bit blindsided. And then they put all their focus on, or a lot of their focus on LLMs and everything. The interesting thing.

that ties into that is this article that Shemin has on the screen by Philip D. Dubac. Philip, if you listen to this, I’m a great big fan. Read almost all your articles. Your Zach Houser series you’ve been working on has been great. So keep up the good work. This article, it’s about Archimedes, I think. It was one of those old Greek guys and how he prevented a war.

Dan (1:07:18) Nathan knows who

they are, I’m just asking.

Rahul Yadav (1:07:20) He held off the Romans for a couple of years.

What Philip’s trying to get to in this article is alpha fold ⁓ cost about like a few hundred million dollars, 160 million I think was the cost of training alpha fold. And if you look at the impact that that has had, know, you’ve been able to solve all these like protein folding problems and it’s spun up so many vaccine research efforts and all these things.

that wouldn’t have been possible in any reasonable timeframe. Sorry, the training cost was one million for AlphaFold 2 by Philips estimate. Compare that to how much money that is going on into the current infrastructure build out, how much open AI spends almost every day, you know, just to operate or pick any other big alone company.

And at the end of the day, it’s some version of a chatbot that is helping us do other things. Now, I know like there would be future benefits of these.

The investments were making today because then you can use that infrastructure to for other things to it doesn’t have to be restricted to chat bots or anything like that. But the this ties back into the constant tension that Demis has. You can see throughout the book where he references how like the Nobel to him is you could give him billions of dollars and he still wouldn’t trade that for that because fundamentally he wants to do science and you see him caught in

this like competitive pressure and these things that they have to do instead of being able to focus on science.

at the close to the end of the book, he talks about Princeton and how like people used to go back there to do research and stuff. And there’s actually, ⁓ if you go look it up on YouTube, back from 1955, Ed Murrow has an interview with ⁓ Oppenheimer, this, you know, 1955, a decade after the Second World War ended. And he asks Oppenheimer, like, you

know, what attracts these scientists and all that here. And he’s like, they’re addicted to being interrupted, their life depends on being interrupted. But once they come here, there’s no interruptions and no like every day, you know, ⁓ answering these things and all that they have to sit and face the hard problems. And, you know, be able to solve those problems. And their whole job is just like, think really hard about some of the fundamental problems and solve that. And close to the end of the book, Demi

M.S.

referenced Princeton and I think that was a nice and they talk about Oppenheimer a couple of times throughout the book obviously because you know, AI and nuclear bomb and all that. And so you can see that constant tension that Demis wants to do science but is currently under Google trying to get Gemini and all that happened.

So that to me was one of the biggest, through all the corporate tensions and everything, the continuous what you would like to do and what you have to do. And even that guy can’t afford to do what he would like to do versus what he has to do.

Shimin (1:10:25) Mm-hmm.

Dan (1:10:31) Hahaha.

Shimin (1:10:33) Yeah, good thing we’re

going to have far fewer AI talent wanting to come to Princeton now, according to the Stanford AI study. Nathan, you have an academic background too. Like how do you feel about this tension between trying to, you know, the cathedral building versus the, I think this article had a quote, you can’t have the cathedral without the wool merchant, which I found to be really good.

Rahul Yadav (1:10:37) Hahaha

Nathan Lubchenco (1:10:55) Hmm.

Dan (1:10:56) you

Nathan Lubchenco (1:10:56) ⁓ that’s good. Yeah. Yeah, I mean I guess I in some ways I just sort of like

gave up because I don’t think I had the sort of top ability to contribute. think sort of really advancing the state of knowledge is super hard. And instead, I don’t know, being productive and contributing to the economy and hopefully heading for an early retirement is the path for me. So yeah, I guess I feel less burdened by that. But I think one of the things that is compelling to me about early retirement is that then, to the extent to which I would choose to maybe

Shimin (1:11:17) Ha

Nathan Lubchenco (1:11:29) on AI safety or something like that that maybe would pay a lot less or other sort of things. I might be interested in model character or something or maybe just thinking about a lot of the things that Will McCaskill is thinking about at the Forethought Institute. Maybe I just would have a lot more flexibility about how I actually spend my time. so I think maybe Demis thinks that he can’t do these things and maybe feels a level of responsibility, but he actually

Shimin (1:11:31) Mm-hmm.

Nathan Lubchenco (1:11:56) could potentially choose otherwise if he was willing to abdicate to a of responsibility that he has and maybe he doesn’t want to do that. But I am unburdened by that.

Shimin (1:12:07) You

Dan (1:12:07) you

Rahul Yadav (1:12:08) And, you know, DeepMind is his life’s work. So that’s still his best shot at building AGI and everything. The thing that you continuously see is he doesn’t, the corporate side is really taking so much of his time, which if you just think about where that could have gone if he didn’t have to do that, you know, you see that throughout the book after the Google acquisition.

Nathan Lubchenco (1:12:36) Yeah, and this is a bit of a tangent, but like you reminding me of him like playing chess. I think like one of the reasons I have such strong intuitions I do is like as you know, a chess player and computers have been better at me for chess for a very, very long time. And I think there’s something about like having cared about chess, having found chess beautiful and having

Dan (1:12:36) I like.

Nathan Lubchenco (1:12:58) computers be better at it than me has not reduced the beauty of playing chess at all. And I think there is something that is different for people who’ve had that experience for 20 plus years versus people who maybe aren’t used to computers being better at things they care about. And I think that’s just one way to think about when there’s a divide of how people view these things differently.

Shimin (1:13:17) Yeah, I think about that a good deal. When it comes to the Jagged Edge, chess is always one of those intellectual pursuits that’s the peak of human capabilities. And we’re okay. We’re okay even though the chess engines are just way better than us now. So maybe we’ll be fine after all.

All right, shall we move on to our last segment? Dan, you want to intro it?

Dan (1:13:38) Yeah,

sure. Yeah, so next up, we’ve got two minutes to midnight, where we talk about where we are or where we might be in the AI bubble through the lens of the 1950s style atomic clock. midnight being everything, the hype machine has collapsed and we’re all doomed. And I don’t know what minus 10

cool.

⁓ yeah, so we’ve got three articles this week. we’re not going to go into great depth on all of them because of such a great chat with Nathan, but, I wanted to start off with just sort of a temperature check on where we’re at in terms of like funding. And, one of the things that immediately popped into my head when I saw this tech crunch article was, ⁓ okay. So that’s where we’re at in the hype cycle, which is two college kids raise a five point.

one million pre-seed to build an AI social network. Note the words AI, letters AI in iMessage.

Yeah, so no shade at all whatsoever on the folks doing this or Ray’s or anything else. It’s just kind of like, I mean, to me, I read this as like you’ve literally put the letters AI in something and you’re being thrown money at for doing that. Yeah. All birds. Yeah, all compute birds. ⁓

Nathan Lubchenco (1:14:45) All birds.

Shimin (1:14:47) Albert, yeah.

Mhm.

Dan (1:14:52) Anyway, so that’s as deep as I want to get into this one. We’ve got two others to go through. So, Rahul, you had one on ⁓ are the costs of AI agents also rising exponentially?

Rahul Yadav (1:15:04) with the Claude pricing, I don’t think I need to say any. You know, do I need to even say anything more about it? The way every day we start hitting our limits so quickly and how much Claude wants to do it dropping wants to charge for Claude? Yeah. But yeah, Toby Ord created this wrote this article.

Shimin (1:15:07) hahahaha

Dan (1:15:08) My rant

is that what saying?

Every day on Lemonade.

Rahul Yadav (1:15:26) about capabilities are going up, but so is the cost of these cutting edge models and how much and the cost that was there served. I had honestly forgotten that O3 is still around. Maybe it is. I haven’t used Chachi PT in a while. But apparently, I think Nathan can maybe balance us out a little.

Dan (1:15:41) We really need to fix our anthropic bias on this podcast. I’m just saying.

Rahul Yadav (1:15:49) But apparently, O3 costs can cost up to like $350 an hour on complex tasks. the comparison there being you can hire a pretty decent lawyer or therapist or high performing person in their job for $350 an hour. So we’re getting closer and closer today to where

economically looking at these things, a human might be able to do these things better than a model can do, especially since the model, the success rate is close to 50%. So half of the time if it’s going to fail and you still spend the 350 bucks an hour, you know, versus a human who might have a better chance of accomplishing those tasks. So over time, it should get cheaper, but

short term I think it’s gonna rise more until we have more you know GPU capacity and all the other bottlenecks that get solved so yeah don’t know how this one would play out

Shimin (1:16:52) Yeah.

And lastly, I have a quick article from CNBC today on April 28th, where OpenAI reportedly missed its own internal revenue targets and all of its compute partners share drop a ton. So Oracle, for example, dropped 5 % in a day. So we might be seeing kind of the end of the, you know, we’re going to pretend like we have all the compute we need.

Nathan, do you reject the premise? Speaking of the premise, like do you reject the premise that we’re in AI bubble? Okay.

Nathan Lubchenco (1:17:23) I do.

Good read. Good read. mean, so I guess it’s complicated because basically I think there is at least one form of AI bubble that I believe in and that’s sort of like the VC funding bubble. I think sort of like VC money funding things that are just bad ideas and a lot of those companies aren’t going to survive and are going to be terrible.

Dan (1:17:27) You

Nathan Lubchenco (1:17:43) Also no shade to the people who working on them. You’re doing whatever need to do. I learned through startups that would not have existed if it had not been for zero interest rate phenomenon. That’s how I learned to code. So do whatever you got to do. But so I think that’s true. And then if you also sort of look at like some other things, like my friend sent me something where like Starbucks now has integrated chat GPT so that they can help you with your order and like nobody wants that. Nobody needs that. So like that’s like the form of bubble.

Dan (1:17:51) You

I think Logitech still wins.

Rahul Yadav (1:18:06) or Chipotle writing your

Shimin (1:18:08) Yes

Nathan Lubchenco (1:18:09) Yeah, yeah, exactly. So I think that all is totally, like, there’s lots of signs of bubble, but I think that ignores sort of like the long-term value. I think like the sort of like the capex investment, the sort of compute constraint and what was that Dan?

Dan (1:18:17) Yeah, we’re not saying the value isn’t there. We’re just, yeah,

well, I think we’re aligned on our, yeah, we’re not saying the value isn’t there. Like that’s why we started doing this, because it’s like talking about like where, what’s going to survive in the world where that happens, because clearly some of it is, right? Like there’s like actual value here. And then the second piece was like, you know, sort of looking at it through the lens of like, wow, like a lot of these things are just.

wrappers around ChatGPT right? Like what’s gonna happen to them, yeah.

Nathan Lubchenco (1:18:47) Yeah, yeah. And

I think wrappers but I also think like, you know, there’s a long period of time where like a lot of like products would be described as like a wrapper around like a crud, you know, a crud app, you know, and so, and I think those ended up being more valuable than people sort of thought in some cases. And so I think the case could also be true that some, you know, GPT wrapper are also valuable. But so I think it’s complicated. ⁓

Dan (1:19:06) Yeah. Well,

yeah, when you get into like agents, I think now we’re starting to see that, right? Like if you’re, if your wrapper becomes a harness, then yeah, then it starts becoming. Yeah. Let’s focus on the

Nathan Lubchenco (1:19:16) Yeah, because I think scaffolding has been undervalued. Yeah.

Rahul Yadav (1:19:20) With the

Shimin (1:19:21) Well, we-

Sorry, Rahul go ahead.

Rahul Yadav (1:19:23) with the there’s like a couple things that we’ll see how those play out the pay as you go pricing means you’re not getting you know massive enterprise discounts or anything and that can change very quickly and so your runway that plus like people bragging about how many tokens they’ve burned every day for

Dan (1:19:43) You

Rahul Yadav (1:19:44) you know, dubitable amount of value delivered between those two things. I could see a world where runways that people thought were, you know, multiple years or whatever then get compressed because the fundamental thing you were using to build your product is

costing you more than the value you’re realizing from that. it almost becomes like restaurants famously have 30 to 60 days of margins before, you know, during COVID and all that when you run out of money very quickly. A lot of these as prices go up, if the value get out of that, does it match up to that? A lot of companies are just going to have to either not use the coding agents at all and

models

at all, or if they keep using it there in this very like, you know, stressful environment where you have to make sure that you can actually keep paying for it because it’s pay as you go, there’s not much you can do about it. So that is about to, I think, come into play pretty soon with the prices going up everywhere.

Shimin (1:20:47) Yeah. But speaking of which, we were at three minutes and 30 seconds last week. How do you feel about this week?

Dan (1:20:54) I personally feel like nothing has really changed.

Rahul Yadav (1:20:56) Yeah.

Shimin (1:20:56) Alright, I feel like I can move it ahead by 15 seconds, but I’m willing to leave it at 3.30. Okay.

Dan (1:21:03) Why

do you want to go ahead?

Shimin (1:21:04) The compute crunch that is kind of manifesting in the GitLab copilot changes in Anthropix policy changes from the undelivered data warehouses and the OpenAI’s missed internal revenue targets.

Dan (1:21:06) Mm-hmm.

Hmm. And you think compute crunch is going to lead to financial crunch?

Shimin (1:21:22) It’s going to,

I think compute crunch is going to lead to them having to raise their prices, which will lead to them not meeting their $1 trillion of revenue that they need over the next five years in order to. Yeah. I mean, I got, that’s just the numbers. They have to grow. Not only do they have to grow, they have to grow like at a rate of $1 trillion over the next eight years. and like that’s, you know, I don’t know. We’ve never seen something like this before. short of the railroad. Right. So. Yeah.

Dan (1:21:31) Yeah.

Rahul Yadav (1:21:48) But Shemin,

all birds is selling all the computer you want.

Shimin (1:21:52) Yes, but can buy all birds sneakers with GPUs attached to them now. Yes.

Dan (1:21:53) You

Rahul Yadav (1:21:54) Yeah, there’s GPUs

Dan (1:21:55) You

Rahul Yadav (1:21:58) snuck into the shoes instead of that stupid brown paper. GPUs are padding the whole thing now.

Shimin (1:22:02) ⁓

Dan (1:22:05) Man,

better go out and buy a couple pairs so can use some more compute.

Shimin (1:22:08) All right, well, let’s leave it at 3.30 and we’ll see where things are next week. Uh-huh. Okay.

Dan (1:22:08) framework

Well, hold on, we have four people on today. What does Nathan think?

Nathan Lubchenco (1:22:16) I guess I would probably, mean, is this towards a bubble or like towards what exactly? Yeah, if anything, I would maybe move, I would move back then. I would move back from 3.30 to like four minutes or 4.30 even, because basically I think the…

Dan (1:22:20) Midnight is the bubble has burst. So where do you think we’re at in the?

Nathan Lubchenco (1:22:32) The potential cybersecurity implications for Mythos and potentially 5.5 combined with the fact that DeepSeek may have closed the gap and is only three to six months behind, it’s around 5.2 era, basically means that open weight models may have cybersecurity capabilities equal to Mythos in the next six months. And I think the implications from that geopolitically and economically are…

Dan (1:22:36) Mmm.

Mm-hmm.

Nathan Lubchenco (1:23:00) quite substantial, so that feels very anti-bubble to me.

Dan (1:23:03) Mm-hmm, like too big to fail is what you’re saying.

Nathan Lubchenco (1:23:06) Yeah, I because I think essentially, basically, ⁓ someone has been tracking how much people in the US legislature mention AI and basically it’s sort of its own, it’s like, they’ve made this sort of like the meter doubling plot, but for like people mentioning it, and it has like a very similar like doubling time. And so I think like the it is going to be increasingly politically relevant, and potentially like, you know, how either, you know,

Shimin (1:23:16) Mm. That’s all that.

Dan (1:23:21) You

Nathan Lubchenco (1:23:31) government intervention or sort of other phases. So yeah, think getting closer to big to fail because of potential national exterior to cure reasons or similar.

Shimin (1:23:38) Okay, hot take. Not that hot. On this show, our seasonal finale is this November when all of these factors come into view at once. Where the politics, the too big to fail, the security issues, fireworks, explosions, all the good stuff. yeah.

Nathan Lubchenco (1:23:38) Hata.

Dan (1:23:40) Not that hot. Yeah, sounded well grounded to me.

You

Stay tuned

for next episode of Artificial Devolver Intelligence and this season of America.

maybe you’re really entertained by it, in which case, get ready.

Rahul Yadav (1:24:08) We like Mistral so much we might even use it one of these days. Honestly guys.

Dan (1:24:12) Yeah, that’s the real bias, right? We’re talking about between OpenAI and Anthropic, but we never even… Well, that’s not true. We talk about open-weight stuff.

Shimin (1:24:20) with you. Well, lots of different opinions. Shall we land the plane on a time?

Dan (1:24:24) I think in honor of Nathan’s well-reson’d argument, I could go back a little bit, but not a whole minute.

Shimin (1:24:31) Okay, I could be talking to you for.

Dan (1:24:34) Four? Whoa, okay.

Shimin (1:24:35) Four

minutes, yeah, 30 seconds. I’ll take bigger swings. All right, Rahul.

Dan (1:24:38) You do.

I like little slices. Rahul doesn’t even like the segment. He’s like, why do we do this segment, guys?

Rahul Yadav (1:24:39) What are you guys pick?

Shimin (1:24:41) Four. Okay. Rahul is in. Alright, great. That’s it. We’re now at four

Nathan Lubchenco (1:24:45) win!

Shimin (1:24:46) minutes to midnight. ⁓

Rahul Yadav (1:24:48) Nathan, I would be

making money off of this stuff if I could tell which way this is going instead of being on this lowly podcast trying to, you know. Yeah.

Dan (1:24:53) So would we all. We wouldn’t be…

Nathan Lubchenco (1:24:53) You

Shimin (1:24:57) when when when we be all ⁓ and

Dan (1:25:00) We also noticed a

fun phenomena where every week we tended to swing back and forth. So it’s pretty great. I know, it’s true.

Rahul Yadav (1:25:04) Hehehehehe

Shimin (1:25:06) We’ll be on the going back phase. ⁓ And

with the clock being adjusted, as always, that is the show, folks. Thank you all for tuning in and joining us in our conversation this week. I’d like to thank you, Nathan, again, for hopping on and being a guest host. And you can find Nathan’s work at the future is what’s yesterday. Self-stack. Yeah, thanks again for joining us.

Dan (1:25:26) was.

Shimin (1:25:30) If you like the show, if you learn something new, please share the show with a friend. You can also leave us a review on Apple podcasts or Spotify. It helps people to discover the show and we really appreciate it. If you have a segment idea, a question for us or a topic you want us to cover, show us an email at humans at adipod.ai. We love to hear from you. You can find the full show notes, the transcripts and everything else mentioned today at www.adipod.ai. Thank you again for listening. I’ll catch you next week. Bye.

Takeaways

Resources Mentioned

Chapters

Transcript

Read more on the blog

Glossary terms in this episode