The Takeaway: Frontier AI gets safer only when teams build safety into the system, not by hoping scale fixes it.
- Bigger models improve capabilities by default, but robustness, jailbreak resistance, and misuse prevention do not reliably improve on their own.
- AI risk is not one problem: mistakes, harmful use, societal effects, and loss of control each need different defenses.
- Governance matters in practice; OpenAI’s safety committee can slow releases when the evidence isn’t good enough.
Zico Kolter, OpenAI board member and chair of its Safety and Security Committee, comes at AI safety from both sides: he runs Carnegie Mellon’s machine learning department and has spent years in AI security research. His core view is blunt: “You can’t just sort of trust models to get safer by getting bigger.” Capability scales fast; safety does not.
That’s why he pushes layered defenses instead of magical thinking. OpenAI’s preparedness framework, and similar systems at Anthropic and Google, set thresholds for risky capabilities like bio, cyber, and self-improvement. But Kolter says that’s only one slice of the problem. The real safety stack also has to cover model behavior, user misuse, and the broader ecosystem as AI becomes embedded everywhere.
He’s especially skeptical of the lazy “wait for the next model” mindset. That works for math or coding. It doesn’t work for robustness. As he puts it, “to make models more robust, to make them broadly safer, you need to be explicit in training them for safety.” In other words: safety is engineered, not inherited.
Kolter also rejects the cartoonish “doomer vs. accelerationist” framing. He sees most serious researchers in the same camp: excited about the upside, unwilling to ignore the downside. The useful question isn’t whether AI is good or bad. It’s whether the industry is building the governance, monitoring, and release discipline to keep up with what these systems can now do.