The Takeaway: AI progress feels sudden because models finally became reliable enough to be useful, not because capability jumped overnight.
- The big shift is from benchmark wins to messy real-world utility: “we moved from competitions to usefulness to users.”
- Reinforcement learning is escaping math and coding contests and starting to work on actual knowledge work, agentic coding, and scientific tasks.
- Efficiency matters as much as raw intelligence: the goal is to make models think less, backtrack faster, and deliver better answers with lower latency.
Yann Dubois, who co-leads OpenAI’s Post-Training Frontiers team, frames the current AI wave as a threshold crossing. He says the models didn’t suddenly get magical; they got dependable enough that people can trust them to do real work. That’s why the last few months have felt like a step function. Internally, the same models are also accelerating the people building them, especially because coding tools now speed up research, training, and infrastructure.
Dubois’s core point is that post-training is where the action moved. Early reinforcement learning was tuned for “verifiable rewards” like math problems and coding competitions. Now those same techniques are being pushed into ambiguous, high-value tasks where there isn’t a clean right answer. That’s a much bigger deal than another leaderboard win.
He also makes a sharp distinction between raw compute and useful reasoning. Longer thinking helps, but only up to a point; the real win is getting models to choose better reasoning paths and recognize dead ends sooner. In his words, an expert doesn’t explore ten directions when one is obviously better. That’s the kind of efficiency OpenAI is chasing.
Dubois, originally trained in biomedical engineering in Switzerland before moving through NLP work in Singapore and a PhD at Stanford, is unusually focused on impact. Even his public note telling quant firms not to reach out says a lot: he wants to build systems that matter, not just systems that score well.