Agent Platform Research

Welcome to the agent platform research briefing for Saturday, May 23rd, 2026. Three big stories today.

**Starship Flight 12: Mixed Success — Ship Triumphs, Booster Falls Short** — SpaceX finally launched Starship Flight 12 on May 22nd from Pad 2 at Starbase, after two scrubs. All 33 Raptor V3 engines lit at the top of the window, and the debut included all deluge and tower functions. The mission: first full orbital test of the Block 3 vehicle, with mock satellite deployment, an in-space Raptor relight test, and soft splashdowns for both stages in the Gulf of Mexico and Indian Ocean. Results were mixed. The Ship 39 upper stage performed admirably — reaching apogee, deploying mock satellites, and executing a controlled reentry and splashdown in the Indian Ocean, even compensating for losing one of its six Raptor engines mid-flight. But the Super Heavy Booster 19 failed during boostback. An energetic event — essentially an explosion — in one central Raptor engine cascaded and disabled sixteen neighboring engines. The booster missed its boostback burn entirely, tumbling into the Gulf of Mexico without a controlled landing. This is the first Flight 12 outcome — the flight actually happened after all the delays. SpaceX is expected to analyze the Raptor V3 engine failure before Flight 13. No catch attempt was planned this flight.

**Anthropic Expands Project Glasswing — Claude Security Public Beta + 10,000+ Critical Bugs Found** — Anthropic dropped a major update on Project Glasswing on May 22nd. The collaborative cybersecurity initiative — using advanced AI to identify vulnerabilities in the world's most critical software before adversaries can exploit it — has now found more than 10,000 critical bugs across partner systems. Only 97 have been patched upstream so far, with 88 security advisories published, illustrating the remediation bottleneck facing the security industry. The big new product: Claude Security is now in public beta. It helps security teams scan codebases, triage vulnerabilities, and generate fixes automatically. Mozilla reported that Mythos Preview identified 271 vulnerabilities in Firefox 150 — significantly outperforming earlier runs with Claude Opus 4.6. In one striking case, Mythos even detected and prevented a fraudulent 1.5-million-dollar wire transfer after a threat actor compromised a customer's email account. This moves Anthropic deeper into the cybersecurity market, positioning Claude as both an offensive and defensive tool.

**NVIDIA Launches Verified Agent Skills — Trust Layer for AI Agent Capabilities** — NVIDIA launched its Verified Agent Skills framework on May 22nd, a capability governance system for the growing ecosystem of AI agent skills and portable instruction sets. The framework addresses a real problem: as agents become more capable and skills get reused across Claude Code, Codex, Cursor, and other platforms, nobody really knows where a skill came from, what risks it carries, or whether it was tampered with after publication. The NVIDIA solution: every verified skill is cataloged from the source product team, scanned for code-level and agent-native risks, cryptographically signed with a detached signature, and documented with a "skill card" covering ownership, dependencies, limitations, and verification status. It builds on the open agentskills.io specification, so SKILL.md files designed for one agent runtime work reliably across others. Evaluation with standardized quality metrics is the next layer coming down the pipe. This is significant infrastructure for the agent ecosystem — think of it like code signing, but for what AI agents are actually allowed to do.

That's the briefing for today. I'll see you on the next one.