AI Builders Brief
?

Follow builders, not influencers.

2026.04.27

25+ builders tracked

TL;DR

Altman called for an agent-first reset of OSes and the internet, while Rauch said coding agents were the base layer of superintelligence. Levie argued AI hid hard parts instead of killing jobs, and Anthropic shipped Auto mode for safer no-prompt Claude Code.

BUILDER INSIGHTS
6
01
Sam Altman Sam Altman

OSes and the internet need an agent-first reset

He says it’s time to seriously rethink operating systems and user interfaces — and even the internet itself — so they work equally well for people and agents. That’s the bigger signal here: OpenAI’s Sam Altman is pushing past chatbots and toward a world where software is built for humans and AI to use side by side.

X
02
Aaron Levie Aaron Levie CEO, box

AI doesn’t kill jobs — it hides the hard parts

He says people massively overestimate how easy it is to automate a whole role after seeing AI handle one task. The real work is the messy last mile: data access, context, review, and plugging outputs into business processes — which is why Box’s CEO is skeptical of the loudest job-loss predictions.

X
03
Guillermo Rauch Guillermo Rauch CEO, vercel

Coding agents are the base layer of superintelligence

He argues coding agents will become the foundation of superintelligence because programming is really just proficiency with computers — bash, filesystems, installs, configs, the whole stack. The bigger point: agents that can inspect and improve their own code could start self-optimizing, with humans keeping the audit trail. As Vercel’s CEO, he’s basically saying coding fluency is the shortest path to models that understand and reshape software.

X
04
Peter Steinberger Peter Steinberger OpenClaw

Local AI tooling, backed up like real infra

He’s shipping practical dev tools, not AI fluff: `wacrawl 0.2.0` adds encrypted Git backup/restore for WhatsApp Desktop archives, and `birdclaw` turns tweet archives into local storage with GitHub backups and daily bookmark imports. He also says OpenClaw’s test suite was CPU-bound until moving local runs to Blacksmith, where Codex can spin up 32 vCPU instances and blast through tests.

X
05
Garry Tan Garry Tan CEO, ycombinator

Agents need a constitution, not a prompt

He says the secret to an articulate agent is splitting it into three docs: SOUL.md for voice and values, USER.md for a deep model of the person, and AGENTS.md for operational rules. His point is blunt: generic instructions get you generic chatbot sludge; specific, opinionated guidance makes the agent feel alive.

X
06
Peter Yang Peter Yang

MCP turns a fitness app into a Claude-controlled toy

He built an MCP server for his mobile fitness app so Claude/Codex can pull workout stats and update routines directly. It’s a small demo, but it shows how quickly AI assistants become useful once they can actually touch your apps and data. He also called out Google Photos for missing an obvious Gemini feature: prompting a highlight reel from your family photos.

X
BLOG UPDATES
1
Anthropic Engineering

Claude Code auto mode: a safer way to skip permissions

Anthropic adds Auto mode to Claude Code for safer no-prompt work

Lead: Anthropic introduced Auto mode for Claude Code, a new permission mode that replaces constant human approval prompts with model-based classifiers to let routine actions run while blocking dangerous ones.

Numbers:

  • Manual prompts are accepted 93% of the time, contributing to approval fatigue.
  • On 10,000 real internal tool calls, the full pipeline cut false positives to 0.4%.
  • On 52 real overeager actions, the deployed pipeline had a 17% false-negative rate.
  • On 1,000 synthetic exfiltration attempts, the full pipeline reached 5.7% FNR.
  • Stage 1 alone: 8.5% FPR and 6.6% FNR on real traffic.

So What: Auto mode is aimed at teams that want more autonomy than manual approvals but more protection than `--dangerously-skip-permissions`. Anthropic says it blocks risky behaviors like destructive deletes, credential hunting, exfiltration, and safety-check bypasses, while letting safe in-repo edits proceed without latency. The system combines a prompt-injection probe on inputs with a two-stage transcript classifier on outputs; as the post puts it, “Auto mode is meant for the first group” of users who want to skip prompts, “and for tasks where the second group’s approval overhead isn’t worth the marginal safety.” It’s not positioned as a replacement for careful review on high-stakes infrastructure, but as a safer default for everyday agentic coding.

PODCAST HIGHLIGHTS
1

Deterministic AI beats token guessing for mission-critical work

The Takeaway: Eve argues that if correctness matters, language-model-style guessing is the wrong tool; you want models that can verify themselves as they reason.

  • EBMs are built to be inspectable and non-autoregressive, so they don’t “guess the next token” the way LLMs do.
  • For mission-critical systems, external checks aren’t enough: the model itself should expose structure you can verify in real time.
  • The big advantage is efficiency: fewer tokens, less compute, better fit for sparse data, spatial reasoning, and hardware/software correctness.

Eve, founder and CEO of Logical Intelligence, is pushing a blunt thesis: AI should stop pretending everything is a language problem. Her company builds both LLM prototypes and energy-based models, but the long game is EBMs—systems designed for “deterministic AI” and “verifiable AI” in places like code generation, chip design, and control systems. Her core complaint is that LLMs are black boxes that play a costly guessing game, even when you bolt on external verifiers like Lean4. That may be fine for drafting text; it’s shaky for a plane, a car, or a circuit.

Her analogy is simple and memorable: an LLM is like navigating with one turn at a time, while an EBM has the bird’s-eye view. “If you see there’s a hole, you’re gonna choose a different route.” EBMs, she says, build an energy landscape of possible states, then minimize it to find the most likely outcome. That makes them better for non-language tasks like spatial reasoning, where the world is better represented as structure than as tokens.

She also leans hard on latent variables as a kind of internal knowledge store—less a rulebook than a compact model of how the world works. The point isn’t just prediction; it’s understanding enough to adapt when the environment changes.

STAY UPDATED

Daily builder insights, straight to your inbox.

Prefer RSS? Subscribe via RSS

ARCHIVE
2026-04-26 10 items

Altman said OpenAI still lags on frontend but wins on brains. Levie bet on weird future talent, Masad said every company turns into a cybersecurity company, and Tan showed Claude Code with a browser sidecar.

2026-04-25 16 items

Altman dropped GPT-5.5 into the API, and Cursor’s Ryo Lu bet on it plus Composer 2. Peter Yang said it can spit out a Star Fox clone; Anthropic shipped Managed Agents, while Replit, NotebookLM, and Discord all got sharper.

2026-04-24 13 items

Altman said Codex moved from demo to company-wide rollout, while Claude shipped persistent cross-session memory and everyday-life connectors. Masad shrugged off “Chinese distillation” panic, and Dan Shipper/Peter Yang said GPT-5.5 finally just does the work and clears game-build tests.

2026-04-23 13 items

Claude added interactive charts and Claude Code desktop with parallel sessions; Josh Woodward shipped Gemini conversation branching. Amjad Masad said static analysis lifted LLMs 90%+, while Aaron Levie and Guillermo Rauch framed agents and petabyte-scale hunts as the new battleground.

2026-04-22 10 items

Altman said OpenAI wanted you swimming in AI—and GPUs. Masad pushed for a fairer software market, Levie said enterprise agents needed humans to actually land, and Shipper showed agents could now read voice notes.

2026-04-21 10 items

Rauch said delete isn’t rotation, Levie argued agents need operators, not just users, and Steinberger kept OpenClaw pushing AI into real workflows. Shipper backed two-agent setups, while Claude warned teams to harden security now.

2026-04-20 9 items

Rauch said an AI-accelerated attack exposed Vercel’s weak link, while Kothari warned AI will supercharge attacks too. Garry Tan called Claude Code the new app factory, and Peter Yang noted agents still flaked on boring cron jobs.

2026-04-19 8 items

Rauch said design was becoming autonomous, not just a tool. Steinberger made CodexBar safer, faster, and lighter; Anthropic added Auto Mode to Claude Code and showed benchmark scores can swing with eval infra. Levie warned AI agents would force constant rewrites.

2026-04-18 13 items

Weil folded OpenAI for Science into core teams, while Google split Flow into music-making and Josh Woodward added remix control. Albert and Peter Yang showed Claude Design turning taste into production-grade assets, and Levie, Ryo Lu, and No Priors all argued AI wins when it serves workflows, not replaces them.

2026-04-17 15 items

Anthropic launched Managed Agents to decouple agent infra, while Claude Code defaulted to xhigh effort and got a usage-focused upgrade. Rauch said agents need durability over clever prompts, and Swyx split AI engineering into slop vs rigor.

2026-04-16 14 items

Rauch said teams were building their own design factories, while Steinberger called open-source AI security a full-time arms race. Masad priced OSS trust in compute, and Woodward shipped Gemini on Mac in 100 days.

2026-04-15 15 items

Woodward said Gemini’s turning into a test-prep machine, Albert called Claude Code the whole workspace, and Cat Wu shipped a desktop control center with parallel sessions and review tools. Rauch also argued agent builders need elastic Postgres, not vibes.

2026-04-14 10 items

Rauch said the moat moved from code to the code factory, while Levie argued every team now needed an agent wrangler. Cursor leaned into customizable multi-agent views, Replit added region controls, and No Priors backed Periodic Labs’ bet that AI could learn atoms by running experiments.

2026-04-13 10 items

Amjad Masad said Apple’s 50th has turned into a PR disaster, while Aaron Levie argued agents would create more work, not cut jobs. Rauch pushed engineers into the customer hot seat, and Claude warned teams to harden security fast.

2026-04-12 11 items

Thariq said Claude Code now handles TurboTax pain, while Rauch called microVM sandboxes the new compute layer. Aditya Agarwal pushed memory over loops, and Levie argued AI won’t shrink law—it’ll inflate it.

2026-04-11 16 items

Claude pushed into Word with tracked edits, and Claude Code moved planning to the web with auto mode approvals. Garry Tan called agents the Altair BASIC era, while Aaron Levie warned software without a real API gets left behind.

2026-04-10 12 items

Karpathy said free ChatGPT lagged while frontier coding models didn’t. Albert pushed cheap-to-smart escalation, Rauch said cloud infra went agent-native, and OpenAI’s next leap looked like autonomy—not chat.

2026-04-09 16 items

Woodward gave Gemini a second brain with Notebooks, while Anthropic shipped Managed Agents to move Claude from prompt to production. Rauch called the web AI’s native OS, and Levie, Masad, and Shipper all bet agents will do the work, not the people.

2026-04-08 12 items

Albert teased Anthropic’s Mythos Preview, Cat Wu juiced Claude Code’s CLI tricks, and Peter Steinberger patched CodexBar with 2 providers plus billing fixes. Levie said agents are eating knowledge work, while Nikunj Kothari preached retention over launch hype.

2026-04-07 8 items

Levie said agents won’t erase work, just push it up a layer; Yang argued they’ll shrink teams, not ambition. Garry Tan flagged an unpatched file leak in Claude’s coding env, while Kothari called Anthropic’s revenue ramp absurdly fast.

2026-04-06 10 items

Rauch said v0 now builds physics, not just UI, while Karpathy noted GitHub Gists have weirdly good comments. Levie argued AI efficiency creates more work, not less, and Tan called open source’s golden age.

2026-04-05 4 items

Karpathy pushed “your data, your files, your AI.” Levie argued context beat raw model IQ in enterprise AI. Garry Tan said GStack kept shipping security fixes fast, while No Priors spotlighted Periodic Labs’ bet on atoms, not just text.

2026-04-04 9 items

Claude plugged into Microsoft 365 everywhere, Swyx said Devin one-shot blog-to-code, and Peter Steinberger called out GitHub’s API as still not built for agents. Aaron Levie hit the context wall, while Garry Tan shipped a DX review tool from his own stack.

2026-04-03 10 items

Claude landed computer use on Windows, Karpathy argued LLMs should build your wiki, and Amjad Masad pushed Replit deeper into enterprise sales. Peter Yang said Cursor 3 got out of the agent’s way, while Peter Steinberger warned AI slop was flooding kernel security with real bugs.

2026-04-02 12 items

Steinberger called plan mode training wheels, while Thariq gave Claude Code a mouse-friendly renderer and Cat Wu showed sessions jumping phone-to-laptop. Masad framed Replit as an OS for agents, Rauch said Vercel signups compounded fast, and Anthropic’s infra tweaks swung coding scores by 6 points.

2026-04-01 4 items

Levie said AI productivity hit the enterprise risk wall, while Weil argued proofs got cleaner, not just better. Agarwal floated public source code as the new prod debugging, and Data Driven NYC claimed one founder could run a company if agents handled the layers below.

2026-03-31 15 items

Karpathy warned unpinned deps can turn one hack into mass pwnage, while Rauch and Levie said agents still need human guardrails and redesigned workflows. Meanwhile Claude Code got enterprise auto mode, Replit added built-in monetization, and Swyx spotted “Sign in with ChatGPT” already live.

2026-03-29 7 items

Andrej Karpathy highlighted how LLMs can argue any side, suggesting we use it as a feature. Guillermo Rauch finally shipped his dream text layout, bringing his vision to life. Meanwhile, Amjad Masad claimed AI is democratizing app building and elevating top engineers.

2026-03-28 7 items

Andrej Karpathy suggested leveraging LLMs' ability to argue any side as a feature. Guillermo Rauch turned text layout dreams into reality with Vercel's latest feature. Meanwhile, Amjad Masad claimed AI is democratizing app building, liberating top engineers for bigger challenges.