The Bottleneck Moved: What Karpathy's Agent Workflow Means for People Who Build Things Solo

I watched Andrej Karpathy's conversation on the No Priors podcast this week and I haven't been able to stop thinking about it. Not because of any single takeaway, but because of how many things he said that quietly confirmed what I've been feeling for months. The way we build things changed, and it happened faster than most people realize.

You can watch the full conversation here: Andrej Karpathy on No Priors

I'm going to unpack the big ideas from that conversation, but I also want to go a little further. If you're someone who builds things solo, teaches yourself everything, and cares about systems, this shift probably matters more to you than to anyone else.

December Was the Phase Change

Karpathy said something that hit hard: he went from writing 80% of his code manually in November 2025 to delegating 80% to agents by December. He claims he hasn't typed a line of code since.

That's a workflow inversion that happened over a few weeks.

What apparently flipped was a convergence of things. Claude Opus 4.5 shipped in late November, Gemini 3 and GPT 5.2 dropped in December, and the scaffolding around these models (tools like Claude Code and Cursor) had been quietly maturing all year. Then developers got holiday downtime and actually pushed these tools on ambitious, from scratch projects. The result, as Latent.Space documented, was a qualitative threshold crossing: coding agents went from brittle demos to sustained, long horizon task completion.

Karpathy described it as going from being bottlenecked by typing speed to being bottlenecked by your own ability to direct agents. He called it "AI psychosis," the feeling that everything is a skill issue now, that if something doesn't work, it's maybe not because the capability isn't there, but because you haven't figured out the right way to string it together yet.

If you've ever felt that weird mix of excitement and anxiety when a new tool unlocks something, that might be the feeling he's describing, just at a much higher intensity.

Macro Actions Over Your Repository

One of the most practical things from the conversation was how Karpathy described his actual workflow. He mentioned Peter Steinberger, creator of OpenClaw, an autonomous AI agent framework that hit 145,000 GitHub stars and got Steinberger hired by OpenAI. Steinberger apparently tiles his monitor with multiple agent sessions across maybe 10 repos and moves between them.

The unit of work has shifted. You start thinking in terms of "delegate a new functionality to Agent 1, and another to Agent 2." You develop a muscle memory for macro actions over your repository.

Karpathy elaborated on X about what this looks like in practice: a few Claude Code sessions in terminal windows on the left, an IDE on the right for viewing code and manual edits. The models still make mistakes, though they're less likely to be simple syntax errors now. More like subtle conceptual errors that a slightly sloppy, hasty junior dev might make. They overcomplicate code, bloat abstractions, and don't always clean up after themselves. You watch them like a hawk.

But here's what I found most interesting: Karpathy suggested that instead of telling agents what to do step by step, you give them success criteria and watch them go. Write tests first, then ask the agent to pass them. Put the agent in a loop with a browser. Write the naive algorithm first, then ask it to optimize while preserving correctness.

Basically, shifting toward declarative thinking. Telling them what you want and letting them figure out the how. That seems to be where the leverage lives.

The AGENTS.md File: Your System for Directing Agents

Karpathy mentioned that part of the skill issue is things like not giving good enough instructions in an "agents.md" file. This is maybe the most directly actionable thing from the whole conversation if you're a builder.

An AGENTS.md file is essentially a README, but written for your AI agents instead of for humans. It tells the agent: here's the tech stack, here are the conventions, here's what good code looks like in this project, here are the commands you can run, and here's what you should never do.

GitHub analyzed over 2,500 AGENTS.md files across public repos and found a clear pattern: the ones that work give the agent a specific persona, exact commands, well defined boundaries, and concrete examples of good output. "You are a helpful coding assistant" doesn't cut it. "You are a test engineer who writes tests for React components, follows these examples, and never modifies source code," that works.

Builder.io's guide breaks down the practical structure:

Dos and Don'ts compile these through trial and error. Run prompts, see what you don't like, add the feedback.
File scoped commands don't let agents run full project builds every time. Give them commands that type check or lint a single file.
Concrete examples point to real files that demonstrate your best patterns. Point away from legacy files to avoid.
Safety boundaries three tiers: always do, ask first, never do.

The interesting wrinkle? Some research suggests that overly large AGENTS.md files can actually reduce agent success rates and increase costs. The people getting the best results seem to keep their instructions tight and essential, then iterate. It might be less about writing a comprehensive manual and more about building a feedback loop, kind of like how you'd onboard a real junior developer.

For solo builders, this feels like a meaningful shift. Your AGENTS.md becomes one of your most important project files. It's a management system for your AI team, more than it is traditional documentation.

The "Claw" Layer: Persistence Beyond a Single Session

Karpathy used the term "claw" to describe something that goes beyond a standard agent session. A claw is an agent with persistence, something that keeps looping in its own sandbox, doing things on your behalf even when you're not watching.

He built "Dobby the elf claw" to manage his home. He told it he had Sonos speakers, and it did an IP scan of his local network, found the system, reverse engineered the API, and started playing music. It did the same for lights, HVAC, shades, and his security system. Now he just texts Dobby through WhatsApp in natural language. It replaced six different apps.

This is where the idea of agent first architecture starts to crystallize. Karpathy's argument is that software should just be APIs that agents use as glue. The customer might not be the human anymore. It could be the agent acting on behalf of the human. The whole UI layer, the onboarding flows, the settings pages, all of that potentially gets replaced by natural language interaction with a persistent agent that knows your setup.

It seems like this is still early. Karpathy himself said he's cautious about giving an agent full access to his digital life for security and privacy reasons. Security experts have warned that autonomous agents like OpenClaw carry real risks. They can go rogue, they're exposed to untrusted inputs, and their error margins are still too high for certain tasks.

But the direction seems clear enough. For builders, maybe the question is less about whether this happens and more about how you prepare your projects and systems for it.

AutoResearch: Removing Yourself as the Bottleneck

Perhaps the most mind bending thing Karpathy discussed was AutoResearch, his attempt to take the human researcher out of the loop entirely.

Here's the setup: he gave an AI agent a small but real LLM training codebase (about 630 lines of Python), a single objective metric to optimize, and let it run experiments autonomously. In two days, it ran 700 experiments and found 20 optimizations that improved training, including hyperparameter tunings that Karpathy himself had missed despite two decades of training models.

Fortune called it "the Karpathy Loop": an agent with access to a single file it can modify, a single testable metric to optimize, and a fixed time limit for each experiment. The agent reads research, develops hypotheses, edits training code, runs experiments, learns from failures, and keeps going.

Shopify CEO Tobias Lütke reportedly tried it on internal company data and got a 19% performance gain overnight from 37 experiments.

The structure is worth sitting with if you build things:

A clearly scoped file the agent can modify
An objective, measurable metric for success
Constraints telling the agent what not to touch
A stopping criteria so the loop doesn't run forever

Karpathy pointed out an important caveat: this works best when you have something verifiable. If you can measure it, you can probably auto research it. If you can't evaluate it, like whether a joke is actually funny, or whether a piece of writing captures the right tone, you probably can't automate the improvement loop. That might be the most honest assessment of where the limits still are.

The Jevons Paradox: Why Cheaper Software Might Mean More Software

Karpathy brought up the Jevons paradox in the context of software engineering jobs, and I think it's one of the more useful mental models for understanding what's happening.

The original Jevons paradox comes from 19th century economics: when steam engines got more fuel efficient, coal consumption didn't decrease. It exploded, because cheaper energy unlocked new uses. Something similar seems to be playing out with software.

AI makes building software 2x to 10x faster for certain categories: internal dashboards, prototypes, CRUD apps, small scale automation. As Turing College documented, this doesn't necessarily mean companies build the same products with fewer people. It could mean thousands of previously unjustifiable projects suddenly become economically viable. Internal tools that needed a team of five for six months can be prototyped in a week. Custom integrations that sat on the "nice to have" backlog for years become two sprint projects.

The total surface area of "things worth building" may have expanded by an order of magnitude.

For solo builders, this might be the most important takeaway. The value probably isn't in writing code itself. It's more in knowing what's worth building, specifying it clearly, and verifying the output. That's a pretty different skill set than what most people think of as "programming."

Karpathy said it directly: software will become ephemeral and easier to rewire. The thing that stays expensive is specification and verification, knowing what to build and knowing whether what was built is actually correct.

The Jaggedness Problem

One of the most honest moments in the conversation was when Karpathy described current models as "jagged." They can feel like a brilliant PhD student and a 10 year old at the same time.

He gave a great example: if you ask ChatGPT for a joke, you tend to get the same one about scientists not trusting atoms because "they make everything up." That joke hasn't improved in years. Why? Because it's outside the reinforcement learning loop that optimizes code and math. The models get relentlessly better at things that have clear, objective success criteria, and they stay weirdly stuck on things that don't.

This jaggedness means you can't fully let them loose yet. They struggle with nuance, with knowing when to ask clarifying questions, with the soft edges of real world problems. For builders, this probably means you're still the quality control layer. The agent might be your hands, but you're still the judgment.

Education Gets Reshuffled

The last thing that stuck with me was Karpathy's take on education. He built microGPT, a GPT trained from scratch in 243 lines of pure Python, no PyTorch, and his realization was: I don't need to explain this to humans anymore. I explain it to agents. If the agent gets it, it can explain it to any human in any language with infinite patience.

That's a pretty wild inversion of how we think about documentation and teaching. Instead of writing tutorials for people, you write the core algorithm for agents, and the agents become the teachers. Karpathy said his value add is the "few bits" of intuition in those 200 lines that the agent can't quite come up with on its own yet.

For anyone who teaches themselves everything, and I'd guess that describes most people reading this blog, the implication might be that the self teaching loop is about to get radically faster. You don't necessarily need to find the right tutorial, the right pace, the right level of explanation. You need an agent that understands the core material, and then you have a patient, infinitely customizable tutor.

What This Means If You Build Things Solo

Here's what I'm taking away from all of this:

The bottleneck moved. It used to be typing speed, then coding skill, and now it seems like it's shifting toward your ability to specify, direct, and verify. If you've been someone who builds things by teaching yourself (reading docs, grinding through Stack Overflow, figuring it out), the new version of that might be learning to manage agents effectively. Your AGENTS.md file, your prompting patterns, your verification habits. Those could be your new core tools.

Token throughput as a metric. Karpathy compared unused subscription time to a PhD student's GPUs sitting idle. If your agents aren't running, you're potentially leaving leverage on the table. For solo builders, this might mean rethinking how you structure your work day. Maybe it's less about deep focus coding sessions and more about keeping multiple agent sessions productive at once.

Declarative thinking. Instead of walking agents through something step by step, try telling them what success looks like and letting them loop until they get there. Write tests first. Give them success criteria. This seems like one of the more leveraged changes you could make to your workflow right now.

Verifiable metrics unlock automation. If you can measure it, you can potentially auto research it. If you can't, you're probably still the bottleneck. It might be worth looking at your current projects and asking: what has a clear, testable success metric? That's where you point the agents.

Software is becoming ephemeral. The Jevons paradox suggests we'll have more software than ever, but much of it could be disposable, created on demand, customized to the moment. The skill that endures might be knowing what to build and whether it works.

I don't think any of this is settled yet. Karpathy himself said he's nervous, and if the person who coined "vibe coding" is nervous, maybe we should all be paying close attention. But paying attention is different from panicking. These tools can be really empowering if you stay updated. And if you're the kind of person who builds things and teaches yourself everything, you've probably been training for this kind of rapid adaptation your whole life.

The game just changed. The systems need to change with it.

The AGENTS.md File: Your System for Directing Agents

Related posts