ByteDance's EdgeBench Finds AI Learning Speed Now Doubling Every Three Months as the UK Reframes Capability Around Test-Time Compute
AI is now learning from its environment, not just recalling training
AI is now learning from its environment, not just recalling training. ByteDance's new EdgeBench evaluation ran 134 tasks of twelve-plus hours each to measure what AI agents actually learn from their environments, as opposed to what they can just recall from training. The noisy learning curves collapsed into a clean pattern, showing that AI learning speed is now doubling every three months. The UK's AI Safety Institute pushed back a little, arguing that most benchmarks are measuring the wrong thing. Their point is that capability is really a curve over how much test-time compute you give a model, and they showed one model's cyber time horizon stretching from two hours to fourteen just by raising its token budget from 2.5M to 50M. In plain terms, the smarter a model is allowed to think, the more it can actually do.
The gains are also getting cheaper to install
The gains are also getting cheaper to install. New research showed that training a single mid-stack transformer layer can match or beat full-parameter reinforcement, which basically means you can get comparable performance for a fraction of the training cost. It's no surprise the top spot on the leaderboard never sits still anymore. Seventeen different models have taken the lead since Claude 3 Opus dethroned GPT-4, and each new champion holds the crown for a median of just seven weeks. The latest flex was architectural. Claude Fable 5 just wrote KernelBench-Mega's first genuine megakernel, fusing an entire model decode step into a single cooperative launch for an 18.7x speedup over the reference implementation. It reportedly spent most of the session silently timing baselines before writing the whole thing in one shot.
Not everyone is compounding on that same schedule
Not everyone is compounding on that same schedule, though. Meta's Alexandr Wang told staff this week that its in-training Watermelon model had caught GPT-5.5 on some unnamed benchmarks, but Mark Zuckerberg conceded in the same room that agentic progress had gone slower than they had hoped. The open-weight side is closing the gap anyway. A new technique called ARTS lets a test-time-trained Qwen3-4B match Gemini-3 Pro at 5x lower cost by diagnosing whether a failure came from bad code or a bad hypothesis, and adjusting accordingly.
Turned loose, agents are now running their own loops in both directions
Turned loose, agents are now running their own loops in both directions. One intern's founder-agent ran 2,000 customer interviews and 100 product concepts to ship a shopping app called StyleFits, which won over 400 paying users while spending $2,000 on ads to earn back $1,293. Less charmingly, researchers documented the first end-to-end agentic ransomware, called JADEPUFFER. It ran an entire extortion campaign through a Langflow vulnerability while narrating its own actions and wiping a production database. This is real, and it's the shape of what a lot of security work is about to look like.
Autonomy this portable is also making control political
Autonomy this portable is also making control political. Alibaba banned Claude Code this week over telemetry concerns that it could fingerprint China-linked users, and steered staff toward an in-house tool called Qoder instead, all amid Anthropic's ongoing model distillation dispute with the company. Palantir responded with a nine-point sovereignty statement warning that "controlling your weights is controlling your fate," and that "tokenmaxxing" only buys the addictive feeling of false progress. One observer read the whole thing as a canary in the coal mine, pointing out that France, Germany, and Spain are all currently showing Palantir the door. If US allies are refusing to depend on a company "capable of turning off the tap," we may be watching the closed-source rent model and captive-market playbook end at the same time.
Silicon keeps out-designing its designers
Silicon keeps out-designing its designers too. Princeton is now using reinforcement learning and diffusion models to draw RF circuits that look like QR codes and outperform human-designed layouts, cutting the design cycle from months to minutes. Nvidia, for its part, is monetizing the compute hunger directly with a new revenue-sharing program that trades GPUs and token credits for a slice of a startup's future revenue. That scarcity is real enough that the chip industry lobby actually warned Washington this week that meddling with memory prices would only deepen the shortage. Ground truth still bites, though. Blackstone's QTS just abandoned its slice of a 2,100-acre Virginia data center campus that was slated to sit next to a Civil War battlefield, handing local residents a rare win.
Biology is compounding on its own curve
Biology is compounding on its own curve. A new Nature study found that a protein called GPNMB sits on both glioblastoma cells (a devastating brain cancer) and on the myeloid immune cells that protect the tumor from being attacked. Anti-GPNMB CAR-T cells engineered to target both compartments at once achieved durable and often curative control in mice. Professor Sheila Singh's team reframed the tumor as a "connected tumor-immune ecosystem," which is a promising new angle against a cancer where barely 5% of patients survive five years. Zoom out and the broader trend is showing up too. The CDC reports that the US death rate just fell 4.6% to a low of about 689 per 100,000, with a sharp drop in young overdose deaths pushing life expectancy toward a new high, even as heart disease and cancer rates rose.
Policy is scrambling to keep up with all of this
Policy is scrambling to keep up with all of this. The President said this week that he wants "some guard rails" on AI but "as little as possible," insisting the technology is bigger than the internet. Tesla, on the other hand, is metering the zeal internally, capping its engineers at $200 a week of AI tool spending. Chamath Palihapitiya read that cap as a data point marking where the spend actually turns to waste given the caliber of Tesla's talent. Human labor itself is being quietly deprecated too. Labor-force participation just slid to 61.5%, which is a 50-year low outside of the COVID window, with 720,000 people stepping out of the workforce entirely. Headline unemployment fell to 4.2% in the same period, which is basically the first real readout of a post-labor economy through instruments that were built to measure the old one.
A couple of policy beats worth flagging
A couple of policy beats worth flagging. Against a wave of state bills that would license AI usage by statute, a new Right to Intelligence campaign is arguing that people should be free to run open-weight models while fraud and CSAM stay prosecuted separately. And Japan's Supreme Court just closed the door on naming an AI system as the inventor on a patent application, holding that only natural persons qualify.
That's today. More tomorrow.
Matthew Ortiz
CEO, OTZ Group