AI Builders Brief
?
← BACK TO TODAY

Follow builders, not influencers.

2026.04.17

25+ builders tracked

TL;DR

Anthropic launched Managed Agents to decouple agent infra, while Claude Code defaulted to xhigh effort and got a usage-focused upgrade. Rauch said agents need durability over clever prompts, and Swyx split AI engineering into slop vs rigor.

BUILDER INSIGHTS
12
01
Alex Albert Alex Albert AnthropicAI

Opus 4.7 gets more predictable, more polished

He says Opus 4.7 is notably better at async work, instruction-following, and token control, with a new xhigh effort level that makes output more predictable. It also stops downscaling high-res images and shows more taste in UIs, slides, and docs — the kind of quality-of-life upgrades that matter in real workflows.

X
02
Cat Wu Cat Wu anthropicai

Claude Code now defaults to xhigh effort

They said Claude Code is now set to xhigh by default for Opus 4.7, with `/effort` letting you dial it up or down. The bigger point: the model is getting better at verifying its own changes, so they’re pushing teams to bake testing workflows into `claude.md` or a `/verify-app` skill.

X
03
Guillermo Rauch Guillermo Rauch CEO, vercel

Agents need durability, not just clever prompts

He says the real hard part of agents and backends is durability: models go down, APIs rate-limit, databases stall, and you still get paged. His pitch is that Workflow SDK gives backend apps the same kind of reliability Next.js brought to frontend dev, with self-hosting and multi-cloud from day one.

X
04
Amjad Masad Amjad Masad CEO, replit

Web apps can become iOS apps for under $10

He says you can turn a web app into an iOS app for less than $10, which is the kind of distribution hack founders actually care about. The other posts are Replit promos — 50% off for running parallel agents faster, plus EU deployment — but the main signal is still: ship once, reach mobile cheaply.

X
05
Aaron Levie Aaron Levie CEO, box

Codex turns enterprise content into agent fuel

The new Codex is another step toward agents that can code, use tools, and run long tasks in the background for knowledge workers. He says the Box plugin makes that especially powerful, since it lets enterprise content flow across apps to automate things like reports, data rooms, contracts, onboarding, and invoices.

X
06
Ryo Lu Ryo Lu Cursor_ai

Cursor’s design stack is becoming its own product

He says he uses different models for different jobs: Opus 4.7 for planning, Composer 2 for building and iterating, and Codex/GPT-5.4 for the nasty bugs. That’s a pretty clear signal Cursor is turning model choice into workflow design, not just a chat box with autocomplete.

He also teased Baby Glass, a new prototyping environment with @flowstated that lets designers remix ideas in code using the same shared components behind Cursor 3’s interface.

X
07
Kevin Weil Kevin Weil VP, OpenAI

OpenAI ships a science model for real labs

They launched GPT-Rosalind, a frontier model tuned for biology, drug discovery, and translational medicine, with built-in knowledge of the databases and tools researchers actually use. They’re gating access for qualified customers because of bio-safety concerns, and also shipping a Life Sciences plugin for Codex to everyone.

X
08
Thariq Thariq anthropicai

Claude Code gets a usage-focused upgrade

They’re rolling out updates to Claude Code’s /usage flow after talking with users about helping them get more out of the tool. Anthropic is also adding a curated “What’s New” docs section and monthly “what we shipped” webinars, which feels like a push to make the product easier to follow as it moves fast.

X
09
Zara Zhang Zara Zhang

HTML is becoming the playground for agents

She says the best use of time is deep talk, deep read, and deep play — especially the kind of random AI experiments that lead somewhere unexpected. Her bigger point: HTML is now the medium agents can actually work in, and she’s already turning frontend slides into video to prove it.

X
10
Peter Yang Peter Yang

AI workflows need a second agent to grade the first

He says the practical move is to spin up a separate eval agent that does simple yes/no checks on another agent’s output, then keeps it working until everything passes. He’s already using that pattern for YouTube thumbnails and titles, which is a neat glimpse of how AI work is turning into agent-on-agent QA instead of one-shot prompting.

X
11
Garry Tan Garry Tan CEO, ycombinator

GBrain Voice is getting its own e2e Gemini tests

He says he and his team had to build proper end-to-end Gemini Live tests themselves, and that work is headed into GBrain Voice with an open-source release soon. The other updates are just maintenance: more /ship robustness and security fixes for GBrain.

X
12
Swyx Swyx dxtipshq

AI engineering is splitting into slop vs rigor

He says the biggest divide in AI engineering right now is between the fast-and-loose “slop cannons” and the more disciplined builders. Putting those two camps on separate days at the @aiDotEngineer talks is, in his view, accidentally a pretty accurate map of where the field is headed.

X
BLOG UPDATES
2
Anthropic Engineering

Scaling Managed Agents: Decoupling the brain from the hands

Anthropic launches Managed Agents to decouple agent infrastructure

Lead: Anthropic introduced Managed Agents, a hosted Claude Platform service for running long-horizon agents with a durable session log, stateless harness, and isolated sandbox so the system can evolve without rewiring the whole stack.

Numbers:

  • p50 time-to-first-token dropped roughly 60%.
  • p95 time-to-first-token dropped over 90%.
  • The architecture supports many brains and many hands, with tools exposed through a simple `execute(name, input) -> string` interface.

So What: For builders, the big shift is architectural: Claude’s “brain” no longer has to live inside the same container as its tools, state, or credentials. That improves reliability, makes debugging possible, and lets teams connect agents to external infrastructure like a VPC without peering everything into one box. It also tightens security by keeping tokens out of the sandbox and routing access through vault-backed proxies or bundled auth. Anthropic’s core message is that harness assumptions age quickly as models improve: “We expect harnesses to continue evolving.” Managed Agents is meant to be the stable layer underneath that evolution, while still supporting context recovery, retries, and custom tools via MCP.

Claude Blog

Claude Managed Agents: get to production 10x faster

Claude launches Managed Agents for faster production deployment

Lead: Claude Managed Agents is now in public beta, offering composable APIs and hosted infrastructure to build, run, and govern cloud agents in days instead of months.

Numbers:

  • Claimed to help teams ship agents “10x faster”
  • Internal testing on structured file generation improved task success by up to 10 points versus a standard prompting loop
  • Long-running sessions can persist for hours, even across disconnections
  • Some customer integrations shipped in weeks instead of months; one team deployed specialist agents within a week

So What: Builders can skip the hardest parts of productionizing agents—sandboxing, checkpointing, permissions, auth, tracing, and orchestration—and focus on the user experience. Claude runs the agent harness, manages tool use, context, and recovery, while the Console exposes session tracing, integration analytics, and troubleshooting. The platform also supports multi-agent coordination in research preview and lets teams choose between autonomous execution and tighter prompt-and-response control. As the post puts it, “You define your agent’s tasks, tools, and guardrails and we run it on our infrastructure.” For teams shipping coding, productivity, finance, or legal agents, this means faster launches and less custom infra work.

PODCAST HIGHLIGHTS
1

Notion bets on agents, but only after the models earn it

The Takeaway: Notion’s edge isn’t hype—it’s knowing when to wait, when to rebuild, and when to ship anyway.

  • They rebuilt custom agents four or five times because the models were “too dumb” and the context was too short; patience beat premature polish.
  • Their real advantage is not raw AI capability but product judgment: “not swimming upstream” and spotting when the river changes direction.
  • The company treats agents as the future interface, so every product team now has to build for both humans and agents—not just bolt AI on top.

Simon Last, Notion’s cofounder, and Sarah Sachs, who leads much of the AI org, describe a team that’s been grinding on agents since late 2022. Early attempts failed because tool calling didn’t really exist yet, and even when it did, reliability wasn’t good enough for background work. The breakthrough came later, but the lesson wasn’t “wait for better models” so much as learn how to read the moment: build ahead of capability, but don’t keep forcing a dead end.

Sarah’s framing is the sharpest: the job is to keep the company from “swimming upstream,” while also preparing for the current to shift. That shows up in how Notion runs AI. They don’t worship hackathons, but they do use them to spread fluency. They don’t rely on top-down ideas; they let prototypes from curious builders become real products. And they don’t treat evals as bureaucracy—they’ve built an “agent dev velocity” org so teams can own their own tests and keep shipping safely.

The result is a culture where “demos over memos” isn’t a slogan, it’s the operating system. Notion’s bet is that the software factory future won’t come from one giant agent, but from a lot of small, well-instrumented ones working inside a product people already trust.

STAY UPDATED

Daily builder insights, straight to your inbox.

Prefer RSS? Subscribe via RSS

ARCHIVE