AI Builders Brief
?

Follow builders, not influencers.

2026.04.24

25+ builders tracked

TL;DR

Altman said Codex moved from demo to company-wide rollout, while Claude shipped persistent cross-session memory and everyday-life connectors. Masad shrugged off “Chinese distillation” panic, and Dan Shipper/Peter Yang said GPT-5.5 finally just does the work and clears game-build tests.

BUILDER INSIGHTS
9
01
Sam Altman Sam Altman

Codex is moving from demo to company-wide rollout

He says OpenAI and NVIDIA just tested a new way to deploy Codex across an entire company, and it actually worked. That’s the interesting part: this is less about a flashy AI demo and more about pushing coding agents into real enterprise workflows.

X
02
Amjad Masad Amjad Masad CEO, replit

Open AI beats panic about “Chinese distillation”

He says US politicians are fearmongering about Chinese distillation while Chinese scientists are sharing real AI breakthroughs openly. His take: these advances aren’t about hoarding data, and they help everyone — including small and maybe even big US labs.

X
03
Claude Claude anthropicai

Claude adds persistent memory for agents

Memory on Claude Managed Agents is now in public beta, so agents can learn from every session instead of starting from scratch. Anthropic says the memory layer is built to balance performance and flexibility, and developers can export and manage memories via the API to keep control.

X
04
Aaron Levie Aaron Levie CEO, box

AI won’t cut work — it expands it

He says AI isn’t shrinking workloads; it’s making more work worth starting. At Box, he’s seeing agents turn “never got done” tasks into 3-hour rabbit holes, and even make some ongoing work economical to hire out. He also says GPT-5.5 is a real step up for enterprise knowledge work, with Box’s evals showing a 10-point accuracy jump on complex content tasks.

X
05
Aditya Agarwal Aditya Agarwal CTO, SouthPkCommons

SF wins by packing weird builders together

He says San Francisco’s edge isn’t just talent or VC — it’s a culture where curious, humble builders keep showing up, jamming, and pushing weird ideas until they work. He points to a design talk at South Park Commons and a mind-bending pixel-generation demo as proof that the city’s density of builders is what keeps producing outsized breakthroughs.

X
06
Dan Shipper Dan Shipper CEO, every

GPT-5.5 stops planning and just does the work

He says many models can outline a great plan, then hesitate — but OpenAI’s GPT-5.5 actually follows through. His take is that this is a real behavior shift, not just a benchmark bump, and he’s framing it as a practical upgrade for anyone using AI to get work done.

X
07
Garry Tan Garry Tan CEO, ycombinator

GBrain gets smarter with graph + vector search

He says GBrain’s new evals show a big jump when you layer graph search and vector search on top of grep across knowledge wikis. He’s also pushing more of his OpenClaw cron jobs and subagents onto GBrain Minions, with stability work aimed at making that infra stick.

X
08
Peter Yang Peter Yang

GPT-5.5 finally clears the game-build test

He says GPT-5.5 plus Codex is the first model combo that actually built a working F-Zero-style game in his recurring benchmark. That’s a pretty clean signal that the new stack is moving from demos to real, playable output — and he’s already using it to spin up bots to race against.

X
09
Nikunj Kothari Nikunj Kothari Partner, fpvventures

M&A is about to outpace fundraising

He says the startup market is tilting hard toward acquisitions: the seed-to-A gap is widening, 2021 zombiecorns are finally getting cleaned up, and talent is flowing to big token factories. His blunt takeaway as an FPV Ventures partner: there are plenty of founders, but very few real entrepreneurs — don’t start a company unless you can’t do anything else.

X
BLOG UPDATES
3
Anthropic Engineering

An update on recent Claude Code quality reports

Anthropic fixes three Claude Code regressions, resets limits

Lead: Anthropic says recent quality complaints about Claude Code came from three separate product-side changes—not the API or core models—and all have now been fixed in v2.1.116.

Numbers:

  • 3 distinct issues affected Claude Code, Claude Agent SDK, and Claude Cowork
  • Fixes landed on April 7, April 10, and April 20
  • The prompt change caused a 3% drop in broader evals
  • Usage limits are being reset for all subscribers as of April 23

So What: The company is tightening release controls because the regressions made Claude feel “less intelligent,” forgetful, and overly terse in some sessions. One bug repeatedly dropped prior reasoning after idle sessions, another defaulted users from high to medium effort, and a prompt tweak to reduce verbosity hurt coding quality. Anthropic says it will broaden internal testing on the exact public build, expand code review context, add per-model evals and prompt ablations, and use soak periods plus gradual rollouts for any change that could trade off against intelligence. As the post puts it, “We take reports about degradation very seriously.”

Claude Blog

New connectors in Claude for everyday life

Claude adds everyday-life connectors for travel, shopping, and more

Lead: Claude is expanding its connector ecosystem beyond work tools to include everyday apps like AllTrails, Instacart, Audible, TripAdvisor, TurboTax, Uber, and more, so users can act on personal tasks directly inside chat.

Numbers:

  • Claude directory has grown to 200+ connectors since launching in July 2025.
  • New connectors include 15+ consumer services, from travel and dining to taxes and rides.
  • Connectors are available on all plans; mobile is in beta.

So What: The big shift is that Claude now surfaces the right app dynamically based on your intent, context, and preferences, then keeps the workflow in one thread. Anthropic says, “Claude suggests the right app for what you’re doing,” and if multiple connectors fit, it shows options ranked by usefulness. For builders, this means a larger distribution surface for apps that can be installed into Claude’s directory, while users get a more agentic assistant that can recommend, compare, and prepare actions without leaving the conversation. Privacy and control remain central: no ads, no sponsored placements, app data isn’t used to train models, and Claude must ask before booking or purchasing on your behalf.

Claude Blog

Built-in memory for Claude Managed Agents

Claude Managed Agents get built-in cross-session memory

Lead: Claude Managed Agents now ship with public beta memory, letting agents learn from every session through a filesystem-based layer that’s designed for long-running, production use.

Numbers:

  • Public beta available today
  • Rakuten says first-pass errors fell by 97%
  • Wisedocs reports verification sped up by 30%
  • Memory stores can be shared across multiple agents with different access scopes

So What: This removes a major piece of custom infrastructure for teams building persistent agents: memory is portable, API-manageable, and auditable, with export, rollback, and redaction built in. Because memories are stored as files and mounted directly onto the filesystem, Claude can use the same bash and code execution tools it already relies on, while keeping “full control over what agents retain.” The practical payoff is better continuity across sessions, fewer repeated mistakes, and easier enterprise governance. Teams can use org-wide read-only stores, per-user read/write stores, and concurrent agents without overwriting each other. In short, Claude is positioning memory as a native capability for agents that need to improve over time, not a separate retrieval system you have to assemble yourself.

PODCAST HIGHLIGHTS
1

AI infra is stabilizing, but coding agents are just getting started

The Takeaway: The real shift isn’t “AI is everywhere” — it’s that coding agents have become the proving ground for a new market structure.

  • The infrastructure layer is finally settling into a usable pattern: agents now look like LLMs with tools, a file system, and “skills” as the minimal viable packaging format.
  • The biggest winners won’t just be model companies or apps; they’ll be the “outsourced AI teams” that sit between frontier models and messy enterprise workflows.
  • In coding, the market is still in capability-exploration mode, which means spending more, trying weirder things, and chasing speed can matter more than efficiency.

Swix, the founder behind the AI Engineer events and a close observer of the developer ecosystem, argues that the last year has been less about neat product categories and more about constant adaptation. He thinks the infrastructure chaos is easing, but only because the industry has converged on a simple shape: “skills,” APIs, and agent-friendly tooling. That doesn’t mean the game is over; it means the rules are clearer.

His sharper point is that the AI coding wars are already enormous — with OpenAI, Anthropic, Cursor, and Cognition all fighting for a market that has exploded in under a year. He sees this as a momentum game, not a mean-reversion story. The mistake is assuming coding is saturated when it may still be compounding. “Why if it went from 10 to 50% in the past year, why can’t it keep going?” he asks.

That same logic applies to infra, chips, and even go-to-market. Agents are now the primary users in many systems, which means products need to be API-first, CLI-friendly, and built for machine consumption. The bigger lesson: if you want to know where AI is headed next, watch coding — because it’s the first place where the market is rewarding raw capability over polish.

STAY UPDATED

Daily builder insights, straight to your inbox.

Prefer RSS? Subscribe via RSS

ARCHIVE
2026-04-23 13 items

Claude added interactive charts and Claude Code desktop with parallel sessions; Josh Woodward shipped Gemini conversation branching. Amjad Masad said static analysis lifted LLMs 90%+, while Aaron Levie and Guillermo Rauch framed agents and petabyte-scale hunts as the new battleground.

2026-04-22 10 items

Altman said OpenAI wanted you swimming in AI—and GPUs. Masad pushed for a fairer software market, Levie said enterprise agents needed humans to actually land, and Shipper showed agents could now read voice notes.

2026-04-21 10 items

Rauch said delete isn’t rotation, Levie argued agents need operators, not just users, and Steinberger kept OpenClaw pushing AI into real workflows. Shipper backed two-agent setups, while Claude warned teams to harden security now.

2026-04-20 9 items

Rauch said an AI-accelerated attack exposed Vercel’s weak link, while Kothari warned AI will supercharge attacks too. Garry Tan called Claude Code the new app factory, and Peter Yang noted agents still flaked on boring cron jobs.

2026-04-19 8 items

Rauch said design was becoming autonomous, not just a tool. Steinberger made CodexBar safer, faster, and lighter; Anthropic added Auto Mode to Claude Code and showed benchmark scores can swing with eval infra. Levie warned AI agents would force constant rewrites.

2026-04-18 13 items

Weil folded OpenAI for Science into core teams, while Google split Flow into music-making and Josh Woodward added remix control. Albert and Peter Yang showed Claude Design turning taste into production-grade assets, and Levie, Ryo Lu, and No Priors all argued AI wins when it serves workflows, not replaces them.

2026-04-17 15 items

Anthropic launched Managed Agents to decouple agent infra, while Claude Code defaulted to xhigh effort and got a usage-focused upgrade. Rauch said agents need durability over clever prompts, and Swyx split AI engineering into slop vs rigor.

2026-04-16 14 items

Rauch said teams were building their own design factories, while Steinberger called open-source AI security a full-time arms race. Masad priced OSS trust in compute, and Woodward shipped Gemini on Mac in 100 days.

2026-04-15 15 items

Woodward said Gemini’s turning into a test-prep machine, Albert called Claude Code the whole workspace, and Cat Wu shipped a desktop control center with parallel sessions and review tools. Rauch also argued agent builders need elastic Postgres, not vibes.

2026-04-14 10 items

Rauch said the moat moved from code to the code factory, while Levie argued every team now needed an agent wrangler. Cursor leaned into customizable multi-agent views, Replit added region controls, and No Priors backed Periodic Labs’ bet that AI could learn atoms by running experiments.

2026-04-13 10 items

Amjad Masad said Apple’s 50th has turned into a PR disaster, while Aaron Levie argued agents would create more work, not cut jobs. Rauch pushed engineers into the customer hot seat, and Claude warned teams to harden security fast.

2026-04-12 11 items

Thariq said Claude Code now handles TurboTax pain, while Rauch called microVM sandboxes the new compute layer. Aditya Agarwal pushed memory over loops, and Levie argued AI won’t shrink law—it’ll inflate it.

2026-04-11 16 items

Claude pushed into Word with tracked edits, and Claude Code moved planning to the web with auto mode approvals. Garry Tan called agents the Altair BASIC era, while Aaron Levie warned software without a real API gets left behind.

2026-04-10 12 items

Karpathy said free ChatGPT lagged while frontier coding models didn’t. Albert pushed cheap-to-smart escalation, Rauch said cloud infra went agent-native, and OpenAI’s next leap looked like autonomy—not chat.

2026-04-09 16 items

Woodward gave Gemini a second brain with Notebooks, while Anthropic shipped Managed Agents to move Claude from prompt to production. Rauch called the web AI’s native OS, and Levie, Masad, and Shipper all bet agents will do the work, not the people.

2026-04-08 12 items

Albert teased Anthropic’s Mythos Preview, Cat Wu juiced Claude Code’s CLI tricks, and Peter Steinberger patched CodexBar with 2 providers plus billing fixes. Levie said agents are eating knowledge work, while Nikunj Kothari preached retention over launch hype.

2026-04-07 8 items

Levie said agents won’t erase work, just push it up a layer; Yang argued they’ll shrink teams, not ambition. Garry Tan flagged an unpatched file leak in Claude’s coding env, while Kothari called Anthropic’s revenue ramp absurdly fast.

2026-04-06 10 items

Rauch said v0 now builds physics, not just UI, while Karpathy noted GitHub Gists have weirdly good comments. Levie argued AI efficiency creates more work, not less, and Tan called open source’s golden age.

2026-04-05 4 items

Karpathy pushed “your data, your files, your AI.” Levie argued context beat raw model IQ in enterprise AI. Garry Tan said GStack kept shipping security fixes fast, while No Priors spotlighted Periodic Labs’ bet on atoms, not just text.

2026-04-04 9 items

Claude plugged into Microsoft 365 everywhere, Swyx said Devin one-shot blog-to-code, and Peter Steinberger called out GitHub’s API as still not built for agents. Aaron Levie hit the context wall, while Garry Tan shipped a DX review tool from his own stack.

2026-04-03 10 items

Claude landed computer use on Windows, Karpathy argued LLMs should build your wiki, and Amjad Masad pushed Replit deeper into enterprise sales. Peter Yang said Cursor 3 got out of the agent’s way, while Peter Steinberger warned AI slop was flooding kernel security with real bugs.

2026-04-02 12 items

Steinberger called plan mode training wheels, while Thariq gave Claude Code a mouse-friendly renderer and Cat Wu showed sessions jumping phone-to-laptop. Masad framed Replit as an OS for agents, Rauch said Vercel signups compounded fast, and Anthropic’s infra tweaks swung coding scores by 6 points.

2026-04-01 4 items

Levie said AI productivity hit the enterprise risk wall, while Weil argued proofs got cleaner, not just better. Agarwal floated public source code as the new prod debugging, and Data Driven NYC claimed one founder could run a company if agents handled the layers below.

2026-03-31 15 items

Karpathy warned unpinned deps can turn one hack into mass pwnage, while Rauch and Levie said agents still need human guardrails and redesigned workflows. Meanwhile Claude Code got enterprise auto mode, Replit added built-in monetization, and Swyx spotted “Sign in with ChatGPT” already live.

2026-03-29 7 items

Andrej Karpathy highlighted how LLMs can argue any side, suggesting we use it as a feature. Guillermo Rauch finally shipped his dream text layout, bringing his vision to life. Meanwhile, Amjad Masad claimed AI is democratizing app building and elevating top engineers.

2026-03-28 7 items

Andrej Karpathy suggested leveraging LLMs' ability to argue any side as a feature. Guillermo Rauch turned text layout dreams into reality with Vercel's latest feature. Meanwhile, Amjad Masad claimed AI is democratizing app building, liberating top engineers for bigger challenges.