AI Builders Brief — 2026-05-10

Follow builders, not influencers.

2026.05.10

25+ builders tracked

TL;DR

Altman said Codex made “work while I’m away” feel real. Steinberger said it was fixing terminals and reviewing PRs. Levie argued agents widened access, but experts still won; Peter Yang warned AI slop compounded fast if you stopped editing.

BUILDER INSIGHTS

Sam Altman

Codex makes “work while I’m away” feel real

He says kicking off Codex tasks, going outside with his kid, and coming back at nap time to find everything done makes him genuinely optimistic about the future. The takeaway: AI agents aren’t just demos anymore — they’re starting to behave like a useful async teammate.

#1 2.5k #2 6.2k #3 5.8k

Aaron Levie CEO, box

Agents widen access, but experts still win

He says agents will let way more people build software, do creative work, and explore fields that used to be out of reach. But the real edge still goes to experienced people, who can spot catastrophic mistakes, add context, and get far better output from the same tools.

321

Peter Yang

AI slop compounds fast if you stop editing

He says the real trap with agentic coding isn’t bad output — it’s letting 5% slop stack into a mess you no longer understand. His other gripe: Claude Code can sit there for minutes with zero feedback, which makes the whole experience feel broken even when it’s still working.

#1 229 #2 #3 961

Zara Zhang

AI should output artifacts, not editable files

She argues that when AI does the manipulation, the result should be optimized for human consumption — not for humans to keep pushing pixels around. Her point: people don’t read, they scan and react, so the winning format is beautiful, interactive artifacts instead of clunky docs or slides.

104

Nikunj Kothari Partner, fpvventures

Models should plan in execution units, not human days

He argues AI planning is broken when models estimate work like a human team would. His fix: have the model estimate only the parts it can actually execute, in seconds and tool calls, and call out human-only blockers separately. It’s a practical prompt tweak for anyone trying to get more realistic plans out of LLMs.

#1 #2

Peter Steinberger OpenClaw

Codex is now fixing terminals and reviewing PRs

He’s turning Codex into a real workflow tool: it now checks social signals in PR reviews, and his Crabbox setup can even use it to E2E-fix gifgrep so animated GIFs render in the terminal. He also says his Spotify CLI spogo got much faster, with Codex acting as his DJ.

#1 #2 102 #3 154

Garry Tan CEO, ycombinator

GBrain goes client-server with MCP support

GBrain v0.31.1 shipped real MCP thin client support, so you can run one home server and have everything else connect to it almost like it’s local. He’s pushing GBrain from a single-machine tool into a client-server setup, which is the kind of move that makes an internal AI stack feel much more usable.

#1 #2 256 #3 213

Dan Shipper CEO, every

Benchmarks miss the real work of prompting

He says benchmark scores only show what happens after a human has already done the hard part: finding the prompt that makes a model look good. That’s why he’s excited about Mythos — it looks capable, and the panic around it ignores how much expert human labor still sits between raw model power and useful output.

#1 #2 #3

Swyx dxtipshq

Governments are waking up to AI agents

He says Singapore’s foreign minister is keynoting AIDotEngineer Singapore, with NanoClaw’s creator right after — a neat signal that governments are moving from AI curiosity to actual adoption. He frames it as proof that the international AI partnerships he’s been pushing are landing, with the UK’s Chief AI Officer and Singapore’s cabinet minister now both in the mix.

#1 #2 #3

BLOG UPDATES

Anthropic Engineering

An update on recent Claude Code quality reports

Anthropic fixes three Claude Code regressions and resets limits

Lead: Anthropic says recent quality complaints about Claude Code came from three separate product changes—not model degradation—and all have been fixed, with the API and inference layer unaffected.

Numbers:

Three issues affected Claude Code, the Claude Agent SDK, and Claude Cowork on different timelines.
Fixes landed on April 10, April 20 (v2.1.116), and April 23 usage limits were reset for all subscribers.
A prompt ablation showed a 3% drop in evals for both Opus 4.6 and 4.7.

So What: The company is tightening release discipline around prompt changes, model-specific gating, broader evals, soak periods, and gradual rollouts so intelligence regressions are caught earlier. The practical takeaway for builders is that Claude Code’s defaults are back to higher-effort settings—“All users now default to xhigh effort for Opus 4.7, and high effort for all other models”—and Anthropic is expanding internal code review context to catch bugs like the caching issue that caused Claude to “seem forgetful and repetitive.”

Read original

Claude Blog

New connectors in Claude for everyday life

Claude adds everyday-life connectors for travel, shopping, and more

Lead: Claude is expanding its connector ecosystem beyond work tools to include everyday apps like AllTrails, Instacart, Audible, Tripadvisor, TurboTax, Uber, and more, so users can do more inside a single conversation.

Numbers:

Claude directory has grown to 200+ connectors since launching in July 2025.
New connectors are available across all plans, with mobile in beta.

So What: This turns Claude into a more useful action layer for daily life: it can suggest the right app based on your context, surface multiple relevant connectors when needed, and keep the workflow in-thread. Anthropic says, “Claude suggests connectors and makes recommendations. But you stay in control of its actions,” and it will ask before booking or purchasing on your behalf. Privacy remains a selling point: “Your data from that app isn’t used to train our models,” and connectors can be disconnected anytime. For builders, the takeaway is clear: if your product fits travel, shopping, finance, or local services, you can submit it to Claude’s directory and reach users where they already work and plan.

Read original

Claude Blog

Built-in memory for Claude Managed Agents

Claude Managed Agents gets built-in cross-session memory

Lead: Claude Managed Agents now has public beta memory, letting agents learn across sessions through a filesystem-based layer that developers can export, manage via API, and control end to end.

Numbers:

Rakuten says memory cut first-pass errors by 97%.
Wisedocs reports verification is 30% faster.
Memory supports multiple agents working concurrently against the same store with scoped access.

So What: This turns Managed Agents into a stronger production option for long-running workflows: agents can retain useful context, share learnings, and avoid repeating mistakes without custom retrieval plumbing. Anthropic says the system is optimized so “our latest models save more comprehensive, well-organized memories and are more discerning about what to remember.” For builders, the practical upside is simpler state management with enterprise controls: read/write scopes, audit logs, rollback, redaction, and console-visible session events. Teams like Netflix, Rakuten, Wisedocs, and Ando are already using it to carry context forward, close feedback loops, and replace bespoke memory infrastructure.

Read original

PODCAST HIGHLIGHTS

Training Data

ElevenLabs' Mati Staniszewski: How Voice Becomes the Interface for Everything

Voice is becoming the interface for AI—and trust is the moat

The Takeaway: The real opportunity in audio isn’t just better speech synthesis; it’s making voice the trusted control layer for agents, devices, and services.

ElevenLabs won by entering audio early, staying lean, and monetizing fast instead of burning billions on a giant model bet.
The biggest near-term wins aren’t flashy consumer demos—they’re voice agents that remove friction in support, sales, government, education, and healthcare.
The hard problem isn’t only sounding human; it’s emotional intelligence, trust, and domain-specific reliability when agents start acting on your behalf.

Mati Staniszewski, cofounder of ElevenLabs, built the company with his childhood friend Piotr after growing up in Poland and noticing how bad dubbing was: foreign films were narrated by one monotone voice, no matter who was speaking. That annoyance turned into a thesis: people should be able to speak any language with the same emotion and intonation, and voice will eventually be the primary interface for a world full of software, devices, and robots.

What’s striking is how unglamorous the company’s strategy was at the start. In 2022, audio was still a niche, so the team hired remotely, scraped GitHub for researchers, and shipped quickly enough to generate revenue before scaling the model work. As Mati put it, they focused on “figuring out that stream and be able to be independent.”

The product roadmap followed the workflow, not the hype: text-to-speech, speech-to-text, dubbing, real-time voice agents, and now music. But the next frontier is more subtle. The breakthrough won’t just be perfect cloning; it’s agents that can detect stress, slow down, reassure, interrupt, and adapt. That’s why he thinks trust will matter more than raw intelligence: “You will detect for real authenticated AI in the future and assume it’s fake.”

For founders, the lesson is simple: the moat in AI may not be the model alone—it’s the workflow, the data, and the trust layer around it.

YouTube

STAY UPDATED

Daily builder insights, straight to your inbox.

Prefer RSS? Subscribe via RSS