Andrej Karpathy Summed Up the LLM Ecosystem of 2025
And this is a rare case when a “year-in-review” actually helps you see the whole picture
Editor’s note
In December, the AI news feed usually turns into an endless fireworks show of releases and “best-of-the-best” roundups. But Andrej Karpathy managed to do something very few people can: he described 2025 not as a list of models, but as a shift in habits, architectures, and even metaphors.
His central metaphor of the year is uncomfortably precise: we are not “raising animals,” we are “summoning ghosts.” That is, we are dealing with an intelligence that can be brilliant in one domain and strangely helpless in another — and this is not a bug on the road to “human-like AI,” but a property of the technology itself.
A Russian translation of Karpathy’s text appeared on Habr (by the AI for Devs team), and if you missed the original, it captures all the key turns very well.
Six shifts of 2025 according to Karpathy — in plain human language
What shifted — and how it feels in practice
| What shifted | How it feels in practice |
| RLVR (reinforcement learning from verifiable rewards) became a new baseline stage | Models began to “grow” reasoning and long chains of solutions not because we politely asked them to, but because it is rewarded. |
| We finally “felt the shape” of LLM intelligence | It is jagged: simultaneously a polymath and a naïve schoolkid. As a result, benchmarks explain real usefulness worse and worse. |
| A new application layer emerged on top of models | What matters now is not “which LLM,” but “how you assemble context,” how you orchestrate call chains, and how much control you give the human (“autonomy slider”). |
| The agent moved onto the developer’s computer | Karpathy singles out Claude Code as a “spirit” living next to your files, secrets, and context — and therefore more useful than a cloud agent in a sterile container. |
| Vibe coding went mainstream | Code suddenly became “cheap and disposable”: writing a small utility for one evening is normal; throwing it away is normal too. This changes professions and development tempo. |
| AI started talking to humans with images, not text | Karpathy’s “nano banana” is not about memes, but about the birth of a GUI era for LLMs: people think visually more easily than they read kilometers of text. |
Why these six points are the nerve of the era — not just a personal opinion
1) RLVR: 2025 showed that “intelligence” can literally be bought with compute
Karpathy describes RLVR (reinforcement learning from verifiable rewards) as a new phase that started consuming compute previously allocated to classic pretraining. In human terms: if you have an environment where answers can be automatically verified — mathematics, code, formally verifiable tasks — you can keep “pressing” the model with rewards, and it will discover strategies that look like reasoning.
A key detail Karpathy emphasizes is the emergence of an extra control knob: for some tasks, quality can now be improved at inference time simply by giving the model more time to think and longer trajectories.
2) “Ghosts” and jagged intelligence: sobriety instead of faith
One of the most useful parts of the text is the reminder that LLMs do not have to be “smooth” or “animal-like.” They were optimized to imitate text, to maximize rewards in verifiable domains, and to climb leaderboards — and they ended up exactly like that.
This defines the tension of 2025: models are simultaneously much smarter than we expected and strangely dumber — and this mixture does not magically resolve itself.
From this follows Karpathy’s almost cynical conclusion about benchmarks: since a benchmark is a verifiable environment, it is the first thing that gets overfit (directly or via synthetic data). As a result, “winning the table” correlates less and less with how a model behaves in your real, messy task.
3) The new LLM application layer: competition moved from “models” to “context”
This is arguably the most practical part of the review. Karpathy argues that the application layer decides the fate of the product by doing three things.
It assembles context — not a prompt, but an engineered working memory. It orchestrates multiple model calls into chains and graphs, balancing cost and quality. And it builds the interface plus an “autonomy slider” so that humans can dose risk.
By 2025, the naïve idea “just plug in the strongest model and it will work” was definitively buried. It will not. What works is knowing how to build context and control system behavior.
4) AI “lives” on your computer: this is strategy, not romance
Karpathy praises the approach where an agent sits next to your real environment: files, tokens, configs, repositories, low latency, and familiar UX. This is why he highlights Claude Code and explicitly criticizes early cloud agents that lived in sterile containers instead of the developer’s localhost.
In short: a “swarm of agents in the cloud” may be a beautiful endgame, but 2025 showed that in the messy intermediate world, the winning agent is the one that simply lives next to your real work.
5) Vibe coding: programming became a conversational craft — and that changes software economics
Karpathy almost casually coined the term “vibe coding,” and it turned into a cultural marker of the year. The point is not that “now everyone is a programmer.” The point is that the amount of software worth writing exploded.
Small utilities, disposable prototypes, fast hypothesis tests, draft programs — code became clay rather than marble.
6) Nano Banana and the birth of a GUI era for LLMs
Karpathy’s finale is unexpectedly strong. He argues that text chat is roughly equivalent to the command line of the 1980s. People struggle to read endless text, but easily grasp visual forms.
The next leap, therefore, is not “even smarter in chat,” but “speaking the user’s language”: images, diagrams, infographics, boards, mini-apps. As an early signal, he points to Google Nano Banana / Nano Banana Pro — models whose power lies in the combination of text, visuals, and world knowledge inside a single system.
This is not abstract: Google has official documentation describing Nano Banana as native image models within the Gemini ecosystem, positioned specifically for image creation and editing.
What this means for 2026
2026 is likely to be the year when winners are not those with “the smartest answers,” but those with the most reliable assemblies: systems where context is carefully constructed, agents are constrained, actions are logged, errors are caught, and UX does not force humans to read walls of text — it shows results.
And if this sounds like product management, that is exactly the point.
2025 turned AI into product engineering — not just ML magic.

