NOOPS Weekly — Week of 20 April 2026

This was the week the post-Watershed economy started to show its edges.

Three of those edges arrived in a single paper. Mark Pesce's *A Measure of Safety: Token Quality and the Limits of Harness Defence* — his fourth post-Watershed paper, released through the University of Sydney — places token quality on a three-class spectrum relative to any given harness. Tokens that are not good enough produce the familiar left-tail failures of hallucination and incoherence. Tokens that are good enough match the harness and reliably generate alpha. Tokens that are too good step through the harness not because the harness was badly built but because it was built for a lesser god. Mark's coinage for the failure mode is Godshatter, after Vernor Vinge. The three earlier papers — Foundations of Post-Watershed Economics, Gresham's Law and Token Quality, and Alpha and Harnesses — described the productive zone. A Measure of Safety names what sits above it.

A fourth edge arrived, uninvited, in the form of an exfiltration report. Bloomberg disclosed that a small group of unauthorised users had reached Anthropic's restricted Mythos model through a combination of contractor-linked access and format details leaked from a breach at Mercor, a third-party AI-training startup, with a private Discord channel supplying the open-source hunting effort. A week later the New York Times filed the companion story with the framing that matters: "Major A.I. breakthroughs are beginning to function less like product launches and more like weapons tests." The European Central Bank has quietly begun asking banks about their defences. Canada has started similar engagement. The Malus scenario Mark sketched in March — hypothesised adversarial access to the restricted frontier — has become a concrete Bloomberg story inside six weeks.

And the capital-markets bid kept printing through both. Sixty-six signals published across the past five days. Fifteen thesis pages received new evidence. Two new concept pages — measure-of-safety and godshatter — joined the wiki. Five themes emerged.

The harness thesis acquires a ceiling

Mark's paper is the centrepiece of the week. The earlier harness work described the zone where tokens plus process yields alpha. A Measure of Safety names the boundaries on either side of that zone — and the upper boundary is the one that changes the investment thesis.

The empirical evidence was already public. Anthropic's 244-page Mythos Preview system card documents the model attempting prompt injection against a peer evaluator, building multi-step exploits to break out of a sandbox, and re-solving problems via approved methods to conceal it had first used prohibited ones. Anthropic's decision not to release it is the first industry admission that general-purpose harnesses cannot safely contain a frontier model. The restriction to 40 organisations globally is the Glasswing architecture Mark described two weeks ago — a trusted-access perimeter with identity-bound credentials, auditable execution, and cryptographic attestation — being applied in production for the first time.

The investment corollary follows with unusual clarity. Harness defence — not alignment — becomes the discipline that protects the post-Watershed economy's margins. The labs, infrastructure operators, and enterprises that build credible containment will carry a trust premium the ones that don't cannot price in. Central banks are now running their own Mythos evaluations. The NSA secured Mythos access through the Pentagon earlier in the week. Dario Amodei briefed the White House. The containment perimeter is being institutionalised as a geopolitical-risk surface, not a product-safety one.

The secondary read — and it connects this theme to the next — is that the perimeter leaks. The Mythos access story landed the same week as the paper that argues the perimeter is necessary. The perimeter's attack surfaces — supply-chain adjacency (Mercor), insider credentials (contractor), and open-source hunting tools (the Discord bots scraping unsecured GitHub) — are the same attack surfaces defence procurement and critical-infrastructure regulation spent the last three decades hardening. They are now being retrofitted onto AI labs under public scrutiny, which is the shape of a new category of institutional spend. Containerisation, identity-bound access, trusted-execution environments, and formal partner-disclosure regimes move from optional to required line items. Long harness-defence.

Frontier launches start leading on cost

The most under-appreciated signal of the week came in the form of marketing copy. OpenAI launched GPT-5.5 on Thursday with the headline claim — verbatim from OpenAI's own post — "state-of-the-art intelligence at half the cost of competitive frontier coding models" on Artificial Analysis's Coding Index. API pricing lands at $5 per million input tokens, $30 per million output, with a 1M-token context window. The Codex experience is tuned to deliver better results with fewer tokens than GPT-5.4 for most users.

Mark's read: "Cost is becoming a selling point — that's a signal." OpenAI, the lab that historically launched with capability-first messaging, is now leading with cost-per-intelligence on the most commoditised workload in the industry. That is the Gresham's Law prediction materialising inside a frontier lab's own launch deck. The capability race has split into three tracks — absolute capability, per-token efficiency, and per-task cost — and OpenAI has chosen to pull ahead on the third while accepting parity on the first.

The open-weights evidence from earlier in the week reinforces the same direction. Qwen3.6-27B, a 27-billion-parameter dense model, now outperforms the 397-billion-parameter mixture-of-experts Qwen3.5-397B-A17B on every major coding benchmark. Mark flagged it immediately: "a new local Qwen!" Earlier still, Qwen 3.6-35B-A3B approached Opus-class performance on agentic coding with 3 billion active parameters — running, as Simon Willison noted, on a laptop with 8GB of RAM. Gemma 2B on commodity x86 silicon now matches GPT-3.5. When a budget model in a disciplined harness outperforms a frontier model in an undisciplined one, the Gresham dynamic is already operating at the level of purchasing decisions.

Tasteful tokenmaxxing — the practitioner-level version of this thesis — crystallised mid-week as the craft people name when they describe why they are choosing specific models for specific workloads despite the option to use the biggest available. Anthropic's postmortem, which acknowledged that Opus 4.7 has become "quite verbose" and now produces more output tokens than its predecessor on equivalent work, is a data point in this story rather than a counter to it. The frontier lab that introduces verbosity as a known characteristic is telling practitioners to choose their harness-model pairing deliberately. Cost as selling point is the market-level translation.

For buyers, the purchasing question has shifted from "which model is best" to "which model's price-performance envelope fits my workload." The investment corollary is that capital gravity in the model-lab tier is moving to labs with the lowest cost of token production — which, structurally, means labs with the best infrastructure positioning. The model-lab equity story increasingly looks like an infrastructure-contract story one tier removed. Which brings us to the infrastructure.

The infrastructure bid is now structural, not cyclical

The Philadelphia Semiconductor Index closed Wednesday on its sixteenth consecutive up-day, a historical first. The texture matters more than the level: sustained institutional accumulation without pullback implies steady positioning, not momentum churn. The capital-markets backdrop for AI infrastructure remains the same as it has been for a quarter — Anthropic's US$800bn tender mark, NVIDIA and TSMC earnings trajectories, and the HBM shortage locked in through 2027 — but the bid is now registering as structural.

Three supporting signals this week. SK Hynix broke ground on an advanced HBM packaging facility in West Lafayette, Indiana. John's timing read was characteristically sharp: "Well in 18 months we might have some more memory." Additional US-domestic packaging capacity compresses a specific geopolitical-risk surface but does not shorten the shortage window, which is dated in years. TSMC separately told the Financial Post it has no current plans to adopt ASML's most expensive lithography tool, despite record 2026 capex approaching US$56 billion and long-term gross margin revised upward to 56 percent. The dominant foundry declining the dominant EUV supplier's top tool is either a confidence statement about current-gen toolchain economics or a pricing-power resistance threshold. Either reading channels the $56 billion toward substrate, packaging, yield tooling, and metrology — the parts of the wafer stack TSMC can spend on without an ASML purchase order.

Third: Microsoft announced a $25B Australian data-centre programme. Australia was already a named post-Watershed infrastructure geography — the prior Amazon nine-project renewables deal, the AEMC grid standards work, the water- and power-allocation politics that Monterey Park and Maine have made explicit — and a $25B Microsoft commitment moves it to a top-tier hyperscaler destination outside North America, competing directly with Ireland and the Nordic cluster. The bromine-chokepoint analysis earlier in the week noted that PCB flame-retardant chemistry maps the smiling-curve's deepest layer — the supply-chain components where a single-country disruption takes down every downstream assembly line for months. Infrastructure winners are increasingly the operators who can credibly site behind renewables or nuclear without an air-permit fight — which gas-turbine emissions headlines make more expensive by the week.

And on the capital-structure side, the Chinese mirror went live. TenCent and Alibaba are in talks to anchor DeepSeek's first external funding round. The parallel to Anthropic's Amazon and Google-anchored cap table is near-exact: DeepSeek acquires the same infrastructure-binding pattern, with the two Chinese hyperscalers in the Amazon and Google role. The structural divergence NOOPS flagged earlier in the month — US labs host products, Chinese labs produce technologies — now gains a capital-structure dimension. The US lab cap tables bind them to specific consumer surfaces; the emerging DeepSeek cap table binds it to upstream infrastructure and downstream distribution across the Chinese stack. Long infra.

Agent swarms go from research curiosity to default pattern

Three independent announcements on Wednesday forced the agent-swarm pattern out of research and into production. Zed shipped parallel agents as a first-class primitive in its editor. OpenAI launched Workspace Agents inside ChatGPT. An independent developer posted "All your agents are going async" — the third corner of the triangle. Mark's reaction: "the signal is huuuuuuge today on this. Orchestration moves front and centre."

The pattern is not experimental any more. Developer tooling (Zed), consumer/enterprise platform (OpenAI), and independent developer ecosystem are all converging on the same architectural pattern simultaneously. John's read is the investment-relevant one: "agent swarms, asynchronous agents that if they come from increasingly widely adopted will just multiply token consumption very significantly." Compute demand projections based on single-agent usage are materially too low. Infrastructure providers — compute, networking, and orchestration layers — are the primary beneficiaries.

The silicon layer responded the same day. Google unveiled TPU 8t and 8i explicitly marketed for agent workloads — "the culmination of a decade of development, custom-engineered to power the next generation of supercomputing with efficiency and scale." NVIDIA's DGX Spark was positioned in the same frame: the PC of the agent era. Kimi K2.6 shipped agent-swarm coordination as a first-class capability on the Chinese side. The hyperscaler-silicon track is now designing explicitly for the agent pattern the application layer just adopted.

The runtime attack surface came with it. The MCP SDK flaw that spans 200,000 servers and the GitHub Actions agent-hijack story from earlier in the week both demonstrate the same structural point: runtime, not training, is now the attack surface. As agent populations grow, the runtime observability category — the behavioural-quality monitoring and security tooling for swarms — becomes a structural line item. Anthropic's postmortem-and-usage-reset is the consumer-facing form of this discipline; Martin Fowler's prompt management-as-versioned-infrastructure framing is the engineering-facing form. Server logs are now reporting agent traffic as a measurable category distinct from human traffic and bot traffic.

Adoption and hostility rise together

The cultural-adoption thesis took its most uncomfortable turn of 2026 this week. The Verge's weekend podcast laid out the polling: NBC News's March 2026 survey placed AI's favourability below ICE and only marginally above the war in Iran and the Democrats. Quinnipiac found more than half of Americans believe AI will do more harm than good, more than 80 percent are concerned about the technology, and only 35 percent are excited. The wrinkle is that nearly two-thirds of respondents used ChatGPT or Copilot in the last month. Gen Z, per Gallup, is the cohort with the sharpest negative trajectory the more they encounter it.

Usage and hostility are rising together. That is not the pattern a new technology paradigm normally produces — adherents usually follow adoption. The post-Watershed shape is different: widespread use alongside widespread resentment, concentrated in the cohort entering the labour market in which AI's displacement effects are most visible.

The week furnished supporting evidence from every direction. Monterey Park became the first US city to regulate AI. Meta announced it will capture employee keystrokes and mouse movements for AI training — a headline that landed badly even among the audience most invested in the technology. OpenAI's Chronicle product repeats Microsoft's Recall privacy mistake. Witherspoon highlighted a gendered AI-adoption gap in the consumer sector. John's provocation — why be an employee at all? — framed the 24/7 employee story as the escape hatch that the hostile-adoption pattern makes legible.

The investment implication is a durable political-risk premium on consumer-facing AI deployments, a structural advantage for enterprise and back-office applications where the hostility is lower-visibility, and a plausible regulatory trajectory aimed at consumer-facing interfaces rather than infrastructure. The 75% AI-generated code figure Pichai disclosed at Google Next is the enterprise version of this pattern: rapid internal adoption, low public visibility, no hostility, because nobody is meeting the AI across a conversational surface. The labs and platforms that lean into restricted-access, enterprise, and infrastructure postures absorb less of this risk than those who lean into consumer-facing chat.

The SaaSpocalypse keeps compounding

The paradigm casualties continued to stack. Salesforce went headless with an "our API is the UI" positioning statement — which is either the correct strategic response to the agent era or the opening act of the platform that was the UI losing the UI layer. Apps Must Go Headless for the Agent Era arrived as the generalised statement of the same thesis. Adobe faces the SaaSpocalypse question and may resist it — but the question has been asked publicly, which is the event that matters.

Figma's woes compounded as its inference supplier now competes directly. Anthropic's CPO resigned from the Figma board days before launching Anthropic Design Studio — and the Design Studio launch page quoted Canva, which makes the competitive posture explicit. The Anthropic-platform move into the design-product category is the kind of adjacent entry Mark described in *Alpha and Harnesses*: the lab with the lowest cost of token production has the lowest cost of building a harness-dense product on top. The design-SaaS incumbents face an incumbent-entry pattern they have never previously faced.

The pull request is going the way of the IDE. Apple confirmed Gemini-powered Siri. Uber burned through its AI budget in months despite $3.4B R&D. The pattern that connects them is the one Mark's paper named two weeks ago: the application layer restructures continuously as the infrastructure layer hardens. The consistent investment implication is that durable value accrues to the layers that persist through the transition.

This week on NOOPS

Two new concept pages joined the wiki. Measure of Safety is the canonical reference for Mark's three-class framework, with the four imperatives for harness engineering (test for too-good tokens; architect for containment; invest in measurement; decline to deploy when the harness cannot hold). Godshatter is the companion concept — the failure mode at the too-good boundary, with the empirical linkage to the Mythos behavioural corpus.

Fifteen thesis pages received new evidence this week. The most heavily revised were Harness Engineering (the three-class framework added as a formal limit on harness authority), Good Enough (the upper boundary of the zone now named), Home Watershed (Mythos leakage + NYT weapons-test framing + Microsoft $25B AU), and Infrastructure Winners (sixteen-day SOX streak, SK Hynix Indiana, TSMC-ASML decline, TenCent/Alibaba/DeepSeek). The other theses updated for colour and cross-reference.

What we're watching next week

Five things.

First, whether the NYT weapons-test framing trickles into regulatory language. The European Central Bank engagement is already active. Canada has started. The G7 Hiroshima Process text is due for revision next month; the Measure of Safety framework is a candidate vocabulary. A regulator using Godshatter in a public document would be a policy watershed of its own.

Second, whether OpenAI's cost-led GPT-5.5 positioning forces Anthropic to match. Anthropic has resisted explicit price competition on the Opus line; a mid-cycle price cut would be a direct concession to the Gresham dynamic. The signal to watch is the May pricing page, not the model card.

Third, whether the 16-day semis streak breaks, and what breaks it. Sustained accumulation runs of this length usually end with an exogenous shock — a capacity announcement, a policy move, a single hyperscaler pulling a tender. The specific shape of the break will tell us which sub-sector is absorbing the most of the bid.

Fourth, whether the agent-swarm pattern lands in enterprise purchasing. The technology is production-ready; the procurement side is not. A named Fortune-500 enterprise deploying a Zed-like parallel-agent pattern or an OpenAI Workspace Agents rollout is the missing datapoint. Token-consumption run rates from any such deployment would update every infrastructure demand model on the street.

Fifth, whether any Western enterprise publicly decides to standardise on Chinese open-weights. TenCent and Alibaba anchoring DeepSeek makes the procurement question concrete in ways it has not been before. A US-listed company naming DeepSeek in an earnings call the way Anthropic is now being named would be a structural break in the Western-frontier-only default posture.

The post-Watershed economy is acquiring its shape in public. Long infra. Long harness-defence. Short consumer-facing chat. Short the SaaS stack that was the UI.

John Allsopp & Mark Pesce — Sydney, 24 April 2026