AI Builders Brief
?

Follow builders, not influencers.

2026.04.17

25+ builders tracked

TL;DR

Anthropic launched Managed Agents to decouple agent infra, while Claude Code defaulted to xhigh effort and got a usage-focused upgrade. Rauch said agents need durability over clever prompts, and Swyx split AI engineering into slop vs rigor.

BUILDER INSIGHTS
12
01
Alex Albert Alex Albert AnthropicAI

Opus 4.7 gets more predictable, more polished

He says Opus 4.7 is notably better at async work, instruction-following, and token control, with a new xhigh effort level that makes output more predictable. It also stops downscaling high-res images and shows more taste in UIs, slides, and docs — the kind of quality-of-life upgrades that matter in real workflows.

X
02
Cat Wu Cat Wu anthropicai

Claude Code now defaults to xhigh effort

They said Claude Code is now set to xhigh by default for Opus 4.7, with `/effort` letting you dial it up or down. The bigger point: the model is getting better at verifying its own changes, so they’re pushing teams to bake testing workflows into `claude.md` or a `/verify-app` skill.

X
03
Guillermo Rauch Guillermo Rauch CEO, vercel

Agents need durability, not just clever prompts

He says the real hard part of agents and backends is durability: models go down, APIs rate-limit, databases stall, and you still get paged. His pitch is that Workflow SDK gives backend apps the same kind of reliability Next.js brought to frontend dev, with self-hosting and multi-cloud from day one.

X
04
Amjad Masad Amjad Masad CEO, replit

Web apps can become iOS apps for under $10

He says you can turn a web app into an iOS app for less than $10, which is the kind of distribution hack founders actually care about. The other posts are Replit promos — 50% off for running parallel agents faster, plus EU deployment — but the main signal is still: ship once, reach mobile cheaply.

X
05
Aaron Levie Aaron Levie CEO, box

Codex turns enterprise content into agent fuel

The new Codex is another step toward agents that can code, use tools, and run long tasks in the background for knowledge workers. He says the Box plugin makes that especially powerful, since it lets enterprise content flow across apps to automate things like reports, data rooms, contracts, onboarding, and invoices.

X
06
Ryo Lu Ryo Lu Cursor_ai

Cursor’s design stack is becoming its own product

He says he uses different models for different jobs: Opus 4.7 for planning, Composer 2 for building and iterating, and Codex/GPT-5.4 for the nasty bugs. That’s a pretty clear signal Cursor is turning model choice into workflow design, not just a chat box with autocomplete.

He also teased Baby Glass, a new prototyping environment with @flowstated that lets designers remix ideas in code using the same shared components behind Cursor 3’s interface.

X
07
Kevin Weil Kevin Weil VP, OpenAI

OpenAI ships a science model for real labs

They launched GPT-Rosalind, a frontier model tuned for biology, drug discovery, and translational medicine, with built-in knowledge of the databases and tools researchers actually use. They’re gating access for qualified customers because of bio-safety concerns, and also shipping a Life Sciences plugin for Codex to everyone.

X
08
Thariq Thariq anthropicai

Claude Code gets a usage-focused upgrade

They’re rolling out updates to Claude Code’s /usage flow after talking with users about helping them get more out of the tool. Anthropic is also adding a curated “What’s New” docs section and monthly “what we shipped” webinars, which feels like a push to make the product easier to follow as it moves fast.

X
09
Zara Zhang Zara Zhang

HTML is becoming the playground for agents

She says the best use of time is deep talk, deep read, and deep play — especially the kind of random AI experiments that lead somewhere unexpected. Her bigger point: HTML is now the medium agents can actually work in, and she’s already turning frontend slides into video to prove it.

X
10
Peter Yang Peter Yang

AI workflows need a second agent to grade the first

He says the practical move is to spin up a separate eval agent that does simple yes/no checks on another agent’s output, then keeps it working until everything passes. He’s already using that pattern for YouTube thumbnails and titles, which is a neat glimpse of how AI work is turning into agent-on-agent QA instead of one-shot prompting.

X
11
Garry Tan Garry Tan CEO, ycombinator

GBrain Voice is getting its own e2e Gemini tests

He says he and his team had to build proper end-to-end Gemini Live tests themselves, and that work is headed into GBrain Voice with an open-source release soon. The other updates are just maintenance: more /ship robustness and security fixes for GBrain.

X
12
Swyx Swyx dxtipshq

AI engineering is splitting into slop vs rigor

He says the biggest divide in AI engineering right now is between the fast-and-loose “slop cannons” and the more disciplined builders. Putting those two camps on separate days at the @aiDotEngineer talks is, in his view, accidentally a pretty accurate map of where the field is headed.

X
BLOG UPDATES
2
Anthropic Engineering

Scaling Managed Agents: Decoupling the brain from the hands

Anthropic launches Managed Agents to decouple agent infrastructure

Lead: Anthropic introduced Managed Agents, a hosted Claude Platform service for running long-horizon agents with a durable session log, stateless harness, and isolated sandbox so the system can evolve without rewiring the whole stack.

Numbers:

  • p50 time-to-first-token dropped roughly 60%.
  • p95 time-to-first-token dropped over 90%.
  • The architecture supports many brains and many hands, with tools exposed through a simple `execute(name, input) -> string` interface.

So What: For builders, the big shift is architectural: Claude’s “brain” no longer has to live inside the same container as its tools, state, or credentials. That improves reliability, makes debugging possible, and lets teams connect agents to external infrastructure like a VPC without peering everything into one box. It also tightens security by keeping tokens out of the sandbox and routing access through vault-backed proxies or bundled auth. Anthropic’s core message is that harness assumptions age quickly as models improve: “We expect harnesses to continue evolving.” Managed Agents is meant to be the stable layer underneath that evolution, while still supporting context recovery, retries, and custom tools via MCP.

Claude Blog

Claude Managed Agents: get to production 10x faster

Claude launches Managed Agents for faster production deployment

Lead: Claude Managed Agents is now in public beta, offering composable APIs and hosted infrastructure to build, run, and govern cloud agents in days instead of months.

Numbers:

  • Claimed to help teams ship agents “10x faster”
  • Internal testing on structured file generation improved task success by up to 10 points versus a standard prompting loop
  • Long-running sessions can persist for hours, even across disconnections
  • Some customer integrations shipped in weeks instead of months; one team deployed specialist agents within a week

So What: Builders can skip the hardest parts of productionizing agents—sandboxing, checkpointing, permissions, auth, tracing, and orchestration—and focus on the user experience. Claude runs the agent harness, manages tool use, context, and recovery, while the Console exposes session tracing, integration analytics, and troubleshooting. The platform also supports multi-agent coordination in research preview and lets teams choose between autonomous execution and tighter prompt-and-response control. As the post puts it, “You define your agent’s tasks, tools, and guardrails and we run it on our infrastructure.” For teams shipping coding, productivity, finance, or legal agents, this means faster launches and less custom infra work.

PODCAST HIGHLIGHTS
1

Notion bets on agents, but only after the models earn it

The Takeaway: Notion’s edge isn’t hype—it’s knowing when to wait, when to rebuild, and when to ship anyway.

  • They rebuilt custom agents four or five times because the models were “too dumb” and the context was too short; patience beat premature polish.
  • Their real advantage is not raw AI capability but product judgment: “not swimming upstream” and spotting when the river changes direction.
  • The company treats agents as the future interface, so every product team now has to build for both humans and agents—not just bolt AI on top.

Simon Last, Notion’s cofounder, and Sarah Sachs, who leads much of the AI org, describe a team that’s been grinding on agents since late 2022. Early attempts failed because tool calling didn’t really exist yet, and even when it did, reliability wasn’t good enough for background work. The breakthrough came later, but the lesson wasn’t “wait for better models” so much as learn how to read the moment: build ahead of capability, but don’t keep forcing a dead end.

Sarah’s framing is the sharpest: the job is to keep the company from “swimming upstream,” while also preparing for the current to shift. That shows up in how Notion runs AI. They don’t worship hackathons, but they do use them to spread fluency. They don’t rely on top-down ideas; they let prototypes from curious builders become real products. And they don’t treat evals as bureaucracy—they’ve built an “agent dev velocity” org so teams can own their own tests and keep shipping safely.

The result is a culture where “demos over memos” isn’t a slogan, it’s the operating system. Notion’s bet is that the software factory future won’t come from one giant agent, but from a lot of small, well-instrumented ones working inside a product people already trust.

STAY UPDATED

Daily builder insights, straight to your inbox.

Prefer RSS? Subscribe via RSS

ARCHIVE
2026-04-16 14 items

Rauch said teams were building their own design factories, while Steinberger called open-source AI security a full-time arms race. Masad priced OSS trust in compute, and Woodward shipped Gemini on Mac in 100 days.

2026-04-15 15 items

Woodward said Gemini’s turning into a test-prep machine, Albert called Claude Code the whole workspace, and Cat Wu shipped a desktop control center with parallel sessions and review tools. Rauch also argued agent builders need elastic Postgres, not vibes.

2026-04-14 10 items

Rauch said the moat moved from code to the code factory, while Levie argued every team now needed an agent wrangler. Cursor leaned into customizable multi-agent views, Replit added region controls, and No Priors backed Periodic Labs’ bet that AI could learn atoms by running experiments.

2026-04-13 10 items

Amjad Masad said Apple’s 50th has turned into a PR disaster, while Aaron Levie argued agents would create more work, not cut jobs. Rauch pushed engineers into the customer hot seat, and Claude warned teams to harden security fast.

2026-04-12 11 items

Thariq said Claude Code now handles TurboTax pain, while Rauch called microVM sandboxes the new compute layer. Aditya Agarwal pushed memory over loops, and Levie argued AI won’t shrink law—it’ll inflate it.

2026-04-11 16 items

Claude pushed into Word with tracked edits, and Claude Code moved planning to the web with auto mode approvals. Garry Tan called agents the Altair BASIC era, while Aaron Levie warned software without a real API gets left behind.

2026-04-10 12 items

Karpathy said free ChatGPT lagged while frontier coding models didn’t. Albert pushed cheap-to-smart escalation, Rauch said cloud infra went agent-native, and OpenAI’s next leap looked like autonomy—not chat.

2026-04-09 16 items

Woodward gave Gemini a second brain with Notebooks, while Anthropic shipped Managed Agents to move Claude from prompt to production. Rauch called the web AI’s native OS, and Levie, Masad, and Shipper all bet agents will do the work, not the people.

2026-04-08 12 items

Albert teased Anthropic’s Mythos Preview, Cat Wu juiced Claude Code’s CLI tricks, and Peter Steinberger patched CodexBar with 2 providers plus billing fixes. Levie said agents are eating knowledge work, while Nikunj Kothari preached retention over launch hype.

2026-04-07 8 items

Levie said agents won’t erase work, just push it up a layer; Yang argued they’ll shrink teams, not ambition. Garry Tan flagged an unpatched file leak in Claude’s coding env, while Kothari called Anthropic’s revenue ramp absurdly fast.

2026-04-06 10 items

Rauch said v0 now builds physics, not just UI, while Karpathy noted GitHub Gists have weirdly good comments. Levie argued AI efficiency creates more work, not less, and Tan called open source’s golden age.

2026-04-05 4 items

Karpathy pushed “your data, your files, your AI.” Levie argued context beat raw model IQ in enterprise AI. Garry Tan said GStack kept shipping security fixes fast, while No Priors spotlighted Periodic Labs’ bet on atoms, not just text.

2026-04-04 9 items

Claude plugged into Microsoft 365 everywhere, Swyx said Devin one-shot blog-to-code, and Peter Steinberger called out GitHub’s API as still not built for agents. Aaron Levie hit the context wall, while Garry Tan shipped a DX review tool from his own stack.

2026-04-03 10 items

Claude landed computer use on Windows, Karpathy argued LLMs should build your wiki, and Amjad Masad pushed Replit deeper into enterprise sales. Peter Yang said Cursor 3 got out of the agent’s way, while Peter Steinberger warned AI slop was flooding kernel security with real bugs.

2026-04-02 12 items

Steinberger called plan mode training wheels, while Thariq gave Claude Code a mouse-friendly renderer and Cat Wu showed sessions jumping phone-to-laptop. Masad framed Replit as an OS for agents, Rauch said Vercel signups compounded fast, and Anthropic’s infra tweaks swung coding scores by 6 points.

2026-04-01 4 items

Levie said AI productivity hit the enterprise risk wall, while Weil argued proofs got cleaner, not just better. Agarwal floated public source code as the new prod debugging, and Data Driven NYC claimed one founder could run a company if agents handled the layers below.

2026-03-31 15 items

Karpathy warned unpinned deps can turn one hack into mass pwnage, while Rauch and Levie said agents still need human guardrails and redesigned workflows. Meanwhile Claude Code got enterprise auto mode, Replit added built-in monetization, and Swyx spotted “Sign in with ChatGPT” already live.

2026-03-29 7 items

Andrej Karpathy highlighted how LLMs can argue any side, suggesting we use it as a feature. Guillermo Rauch finally shipped his dream text layout, bringing his vision to life. Meanwhile, Amjad Masad claimed AI is democratizing app building and elevating top engineers.

2026-03-28 7 items

Andrej Karpathy suggested leveraging LLMs' ability to argue any side as a feature. Guillermo Rauch turned text layout dreams into reality with Vercel's latest feature. Meanwhile, Amjad Masad claimed AI is democratizing app building, liberating top engineers for bigger challenges.