The Takeaway: The real frontier isn’t smarter prompts — it’s models that can work autonomously for days and learn from the world.
Key Insights
- OpenAI’s chief scientist treats coding, math, and physics as proving grounds, not endpoints: they’re valuable because they’re measurable, hard, and transferable to research.
- The next bottleneck is no longer raw intelligence alone; it’s teaching models to evaluate partial progress, sustain long-horizon work, and generalize beyond cleanly verifiable tasks.
- He’s skeptical that today’s RL pipelines are the final answer for business use cases, and thinks context learning may become the more data-efficient path.
The Story
Jakub Pachocki, OpenAI’s chief scientist, is thinking less about flashy demos and more about what it takes for models to become real collaborators. He says the company’s internal shift is already visible in coding: “we use Codex for the majority of actual coding,” which he sees as evidence that autonomy is moving from theory into daily work.
For him, math benchmarks were never just trophies — they were a “North Star” because they’re brutally clear about success and failure. That same logic now extends to research. OpenAI is watching for models that can discover new things, not just answer questions, and he believes the jump from short tasks to long-horizon work is the key transition.
His view on alignment is equally pragmatic: the hard problem is generalization. Models need to learn what “good partial progress” looks like, especially when the task is messy, open-ended, or tied to the real world. That’s why he thinks the future of AI won’t be a single universal harness, but systems that “meet you where you are” — whether that’s Slack, code, or a scientific workflow.
The punchline: the next wave won’t just be more capable models. It’ll be models that can stay on task, adapt to context, and eventually run for “a couple days” with enough autonomy to produce genuinely useful work.