A compression algorithm dropped last week, and memory chip stocks fell right behind it. Google Research published TurboQuant on March 24, a method that crushes the memory footprint of large language models by roughly six times with what the researchers claim is zero accuracy loss. Shares of major memory manufacturers shed billions in value within a day.
That kind of ripple reaches every business built on server-side AI, from chatbot operators to online gambling services where new users routinely search for a 1xbet registration promo code ahead of signing up. TurboQuant is still a lab result, technically. But markets didn’t wait for deployment.
How Three Bits Replace Sixteen
Every time a large language model generates a token, it stores attention data in something called the key-value cache. Think of it as the model’s short-term memory during a conversation. As context windows grow longer, the cache balloons, eating GPU memory that costs real money per second.
TurboQuant compresses each cached value from the standard 16 bits down to just 3. The peer-reviewed paper, set for formal presentation at ICLR 2026 on April 25, pulls this off in two stages. A polar coordinate conversion makes the data distribution predictable, and then a 1-bit error correction step cleans up what’s left. No retraining. No dataset-specific tuning. The algorithm can be applied to any existing model right out of the box.
On Nvidia H100 hardware, Google reported up to 8x faster attention computation at 4-bit precision. Independent developers reproduced the core claims within hours of the blog post going live, including a working implementation on Apple Silicon built in under 30 minutes using GPT-5.4 to write the code.
The Market Moved Before Peer Review
Investors didn’t need a deep understanding of polar coordinates to read the headline number. Six times less memory. Here is what happened to chip stocks over the two trading days after the announcement.
| Company | Sector | Approximate Drop |
| SK Hynix | DRAM/HBM | 6% |
| Samsung | Memory | 5% |
| Kioxia | NAND Flash | 6% |
| Micron | Memory/Storage | 4% |
| Western Digital | Storage | 3% |
Wells Fargo analyst Andrew Rocha flagged the obvious question. If AI inference suddenly needs 80% less memory, how much hardware does the industry really need? But the compression only targets inference, the phase where models talk to users. Training workloads, which consume the lion’s share of high-bandwidth memory, remain completely untouched. TrendForce still projects standard DRAM contract prices rising 55-60% quarter-on-quarter through early 2026, and HBM demand hasn’t flinched.
Every GPU-Dependent Industry Felt It
Sportsbook platforms, live odds engines, recommendation systems for betting operators, customer support bots, medical imaging pipelines, ad targeting stacks. All of these run on the same GPU racks, and all of them pay inference bills that scale with memory consumption. When a single algorithm threatens to shrink that consumption by a factor of six, budget conversations change overnight.
The irony is that cheaper inference rarely leads to smaller spending. You already know the pattern if you’ve watched cloud computing long enough. Cheaper storage just meant more data, not leaner budgets. Jevons Paradox has haunted hardware suppliers since the steam engine, and there’s little reason to think AI will behave differently.
What Happens After April 25
Google hasn’t released open-source code yet. The official release is expected in Q2 2026, probably timed around the ICLR conference presentation. Community developers have already built working implementations for llama.cpp and MLX, but mainstream adoption through frameworks like vLLM is what will start the real curve.
Here’s the catch, though. The paper’s math lands within a factor of roughly 2.7 of the Shannon limit, the theoretical ceiling for compression efficiency at a given bit-width. That means the easy gains from KV cache squeezing are mostly spoken for. Whatever comes next will have to find efficiency somewhere else in the stack.
You’re watching the market reprice an entire hardware dependency on a paper that most traders probably haven’t read past the abstract. Wouldn’t be the first time.











