AI Builders Brief — 2026-05-20

Follow builders, not influencers.

2026.05.20

25+ builders tracked

TL;DR

Karpathy joined Anthropic for the next LLM frontier. Google bet on a 24/7 AI agent, Cursor turned Composer 2.5 into the workflow, and Anthropic shipped Managed Agents for durable long-horizon work.

BUILDER INSIGHTS

Andrej Karpathy CTO

Karpathy joins Anthropic for the next LLM frontier

He says the next few years at the frontier of LLMs will be especially formative, and he’s jumping into Anthropic to get back to R&D. He also says education still matters to him, and he plans to return to that work later.

131.9k

Josh Woodward VP, Google

Google’s betting on a 24/7 AI agent

Gemini Spark is Google’s new always-on personal agent, built to proactively handle tasks and help manage your digital life — with the user still in control. It’s rolling out to trusted testers this week, then to US Google AI Ultra subscribers in beta next week.

#1 865 #2 193 #3

Ryo Lu Cursor_ai

Cursor’s Composer 2.5 becomes the whole workflow

Composer 2.5 is now doing the full loop for him: planning, building, iterating, and debugging. He says it’s especially strong for UI work because Design Mode keeps you in flow inside Cursor.

#1 861 #2 122

Google Labs

Google Labs ships Genie worlds from real places

Project Genie just got a real upgrade: you can now start worlds from Google Maps Street View locations, organize creations in a library, and share them externally. They also teased Computational Discovery, where AlphaEvolve and ERA generate and test thousands of code variants to speed up model and algorithm discovery.

#1 331 #2 #3

Aaron Levie CEO, box

Enterprise AI is becoming a token-budget war

He says token costs are about to become a top CIO headache, and nobody in Fortune 500 land feels like they have a clean answer yet. The current playbook is messy: throttle by workload, tier access by user, cap spend by team, or make teams prove the ROI before they get more AI. He also says Gemini 3.5 Flash is a real jump for knowledge work, with Box AI tests showing a 12-point gain on complex document tasks and big wins in healthcare, public sector, and life sciences.

#1 281 #2 177

Guillermo Rauch CEO, vercel

Vercel wants CDN bills to stop spiking on virality

They’re shipping a CDN pricing model that smooths over traffic spikes and viral events, so teams get predictable costs without sacrificing performance or getting shoved onto slower routes or priority tiers. He also teased Claude Managed Agents working with Vercel Sandbox, but the big move here is making infra pricing behave less like a surprise tax.

#1 142 #2 205 #3 965

Matt Turck FirstMarkCap

Google’s new Gemini looks seriously competitive

He says Gemini 3.5 Flash is a real step up: stronger multimodal performance, big gains in agentic coding, and leading scores on a bunch of key benchmarks. The caveat is obvious — it’s still just benchmarks and not cheap — but his read is that Google is back in the race and the three-lab competition is making everyone better.

#1 #2

Nikunj Kothari Partner, fpvventures

AI is moving from assistant to autonomous worker

He says the Bay Area still hasn’t fully priced in the shift from assistants to coworkers to autonomous workers. His take: the missing pieces are long-horizon training data, better task harnesses, and models that can now self-correct well enough to keep pushing into real jobs over the next 10–20 years.

#1 104 #2

Garry Tan CEO, ycombinator

LLMs finally make WinFS feel real

He says the old Microsoft dream of WinFS — a system that actually understands and organizes your files — is now plausible with LLMs, and that’s basically what his GBrain project is aiming at. It’s a neat full-circle take from someone who worked on the original effort in 2003-2005: same problem, new tech, finally enough intelligence to pull it off.

#1 #2 #3

Peter Yang

Short roadmaps, fast iterations win

He says the best builders don’t sit on year-long plans — they ship, learn, and iterate 3-4 times to see what sticks. The vibe is very product-lead-in-an-AI-era: 90-day roadmaps, constant experimentation, and keeping the builder muscle from turning to mush.

#1 #2 #3 190

Swyx dxtipshq

AI coding needs tests, plans, and constant steering

He lays out a 4-part AI SDLC: start with real test coverage, use `/plan` to carve up hot paths and improve maintainability, let the agent break backward compatibility when needed, then keep spot-checking and steering bugs as it ships. The vibe is clear: AI coding works best when you treat it like an aggressive junior engineer that still needs guardrails, logging, and human judgment.

#1 #2 #3

BLOG UPDATES

Anthropic Engineering

Scaling Managed Agents: Decoupling the brain from the hands

Anthropic launches Managed Agents for durable long-horizon work

Lead: Anthropic introduced Managed Agents, a hosted Claude Platform service that separates the agent “brain” from its “hands” and session state so long-running work can survive failures, scale better, and adapt as models improve.

Numbers:

p50 time-to-first-token dropped roughly 60%
p95 time-to-first-token dropped over 90%
The architecture is built around three interfaces: session, harness, and sandbox

So What: For builders, the key shift is from fragile, container-coupled agents to a meta-harness where each piece can fail or be replaced independently. The session becomes a durable event log, the harness can restart from `wake(sessionId)`, and tools/sandboxes are invoked through simple interfaces like `execute(name, input) → string`. Anthropic also tightened security by keeping credentials out of the sandbox, using patterns like repo-scoped tokens for Git and a vault-backed proxy for MCP/OAuth tools. The practical takeaway: you can now build agents that are easier to debug, safer to connect to external infrastructure like a VPC, and faster to start when a container isn’t immediately needed. As the post puts it, the goal is a system for “programs as yet unthought of.”

Read original

Claude Blog

New in Claude Managed Agents: self-hosted sandboxes and MCP tunnels

Claude Managed Agents add self-hosted sandboxes and MCP tunnels

Lead: Claude Managed Agents can now run tools in self-hosted sandboxes you control and connect to private MCP servers through MCP tunnels, keeping agent execution and internal services inside your enterprise perimeter.

Numbers:

Self-hosted sandboxes are in public beta on the Claude Platform.
MCP tunnels are in research preview and require access request.
Supported sandbox providers include Cloudflare, Daytona, Modal, and Vercel.

So What: Builders can now keep sensitive files, packages, credentials, and network access under their own security and runtime controls while Anthropic still handles the agent loop for orchestration, context management, and error recovery. Anthropic says, “files and repositories don’t leave,” and MCP tunnels let agents reach internal databases, private APIs, knowledge bases, and ticketing systems without public endpoints or inbound firewall changes. The practical takeaway: choose a sandbox provider that matches your workload—stateful long-running jobs, GPU-heavy tasks, or low-latency startup—and use MCP tunnels to expose internal tools safely to agents. The feature is available in Managed Agents, and MCP tunnels also work in the Messages API.

Read original

PODCAST HIGHLIGHTS

Training Data

Rebuilding IT From the Ground Up for the AI Age: Serval's Jake Stauch

Serval wins by making automation easier than manual work

The Takeaway: The real moat in AI enterprise software isn’t raw model power — it’s making automation so easy, safe, and useful that people choose it over doing the task manually.

Key Insights

Serval keeps the old enterprise primitives — workflows and databases — but uses AI to generate and maintain them instantly, instead of making teams wait weeks for developers.
The product has to be simpler than the manual workaround, or nobody will use it; if resetting a password is easier than building the workflow, the workflow loses.
In AI-native software, the boundary layer matters more than the model layer: permissions, approvals, audits, logs, and scoped tools are what let enterprises trust the system.

The Story
Jake Stauch, founder and CEO of Serval, is rebuilding enterprise service management for the AI age. His core belief is blunt: employees should get help at work instantly, and the software should do the boring coordination behind the scenes. Serval’s “cogen” engine turns natural language into code, so admins can describe a workflow and have it appear immediately, with the database kept current automatically.

What makes that philosophy sharp is Jake’s obsession with usability. He argues that if automation is harder to create than the manual task, people will always default to the manual path. That’s why Serval also built an agent that detects duplicate workflows and helps clean up the mess when teams over-automate. As he puts it, “the product is the boundaries” — the controls are what make AI safe enough for enterprise use.

Jake is also unusually customer-immersed: he says he’s in every customer Slack channel and uses that constant feedback as the company’s real moat. The result is a product shaped less by theory than by lived friction, from AI-native startups to giant enterprises where tickets disappear into “the abyss.”

YouTube

STAY UPDATED

Daily builder insights, straight to your inbox.

Prefer RSS? Subscribe via RSS