AI Builders Brief — 2026-05-28

Follow builders, not influencers.

2026.05.28

25+ builders tracked

TL;DR

Every argued AI lowered the floor but raised demand for experts. Anthropic fixed three Claude Code regressions, reset limits, and shipped Managed Agents to split agent brains from tools. Claude also added self-hosted sandboxes and private MCP tunnels.

BLOG UPDATES

Anthropic Engineering

An update on recent Claude Code quality reports

Anthropic fixes three Claude Code regressions, resets limits

Lead: Anthropic says recent complaints about Claude Code quality came from three separate product changes—not a model or API degradation—and all have now been fixed as of April 20 (v2.1.116).

Numbers:

3 distinct issues affected Claude Code, the Claude Agent SDK, and Claude Cowork
March 4: default reasoning effort changed from high to medium, then reverted April 7
March 26: a caching bug was fixed April 10 in v2.1.101
April 16: a verbosity-reducing prompt change was reverted April 20
One eval showed a 3% drop for Opus 4.6 and 4.7 after prompt ablations

So What: For builders, the key takeaway is that product-layer changes can look like model regressions even when the API is fine. Anthropic is now resetting usage limits for all subscribers and tightening release controls: broader per-model evals, more ablations, soak periods, gradual rollouts, and better internal dogfooding on the exact public build. The company also plans to improve its code review tooling and add guidance so model-specific changes are gated to the right model. As Anthropic put it, “We never intentionally degrade our models,” and the fixes were driven largely by reproducible user feedback.

Read original

Anthropic Engineering

Scaling Managed Agents: Decoupling the brain from the hands

Anthropic launches Managed Agents to decouple agent brains and tools

Lead: Anthropic introduced Managed Agents, a hosted Claude Platform service for long-horizon agents that separates the “brain” (Claude and its harness) from the “hands” (sandboxes/tools) and the durable session log.

Numbers:

p50 time-to-first-token dropped roughly 60%.
p95 time-to-first-token dropped over 90%.
The architecture supports many stateless harnesses and many independent tool environments.

So What: The big shift is architectural: sessions live outside the harness, tools are invoked through a simple `execute(name, input) → string` interface, and credentials stay out of the sandbox. That makes agents easier to recover, safer against prompt injection, and more flexible across VPCs, Git, MCP, and custom tools. Anthropic’s message is that harness assumptions can go stale as models improve, so Managed Agents is designed as a “meta-harness” that can survive future changes. As the post puts it, “we aimed to design a system for ‘programs as yet unthought of.’” For builders, the practical takeaway is to move long-running state into durable sessions, treat compute and tools as swappable interfaces, and only provision sandboxes when needed.

Read original

Claude Blog

New in Claude Managed Agents: self-hosted sandboxes and MCP tunnels

Claude adds self-hosted sandboxes and private MCP tunnels

Lead: Claude Managed Agents can now run tool execution inside self-hosted sandboxes and connect to private MCP servers, keeping agent work inside an enterprise-controlled perimeter.

Numbers:

Self-hosted sandboxes are in public beta on the Claude Platform.
MCP tunnels are in research preview and require access.
Supported sandbox providers include Cloudflare, Daytona, Modal, and Vercel.

So What: This is a meaningful step for teams that want Claude agents to work on sensitive code, files, and internal systems without sending data outside their environment. Anthropic says the “agent loop” stays on its infrastructure, while execution moves to your chosen sandbox, where you control compute, runtime image, logging, and network policy. For private tools, MCP tunnels let agents reach internal databases, APIs, knowledge bases, and ticketing systems through a single outbound connection—“no inbound firewall rules, no public endpoints.” Builders can use this to ship more capable enterprise agents with tighter security, better observability, and infrastructure that matches workload needs, including long-running or GPU-heavy jobs.

Read original

PODCAST HIGHLIGHTS

AI & I by Every

We Automated Everything With AI and Tripled Our Headcount

AI lowers the floor, but raises demand for experts

The Takeaway: Automation doesn’t erase work; it floods the zone with “almost right” output that needs humans to finish it.

AI makes yesterday’s expert competence cheap, so more people can produce decent code, writing, and analysis fast.
That cheapness creates a glut of work that looks useful but isn’t quite right, which increases demand for experts to shape, review, and systematize it.
The real limit isn’t raw capability — it’s that agents still need human direction, and the closer they get to humans, the more valuable they become.

Dan, a writer at Every, argues from inside the machine: the company has gone from four people to 30 since the GPT-3 era, while also becoming deeply AI-native. His point is that this should not look like mass job destruction. It looks like a messy expansion of work. When everyone can use Cloud Code or Codex to generate “pretty good” output, the bottleneck shifts from creation to judgment. As he puts it, AI makes “yesterday’s expert competence cheap,” but that doesn’t end the need for experts — it multiplies it.

The sharpest part of his philosophy is that agents get less valuable as they drift away from humans. The farther the system is from a person who can steer it, the more it degrades into generic, slightly-off output. That’s why Every builds rules, review layers, and editorial systems around the tools instead of pretending the tools can run themselves. The paradox is the whole thesis: more automation creates more human responsibility, because someone still has to decide what matters, what’s good enough, and what should be shipped.

His line in the sand is simple: “the further away an agent gets from a human, the less valuable it is.” That’s less a tech prediction than a management doctrine.

YouTube

STAY UPDATED

Daily builder insights, straight to your inbox.

Prefer RSS? Subscribe via RSS