Wednesday, 24 Jun 2026
Subscribe to AIWatcher
AIWatcher
  • Home
  • News

    Apple CEO Warns of Price Hikes as AI Demand Strains Memory Chip Supply

    By
    AIWadmin

    Researchers Expose How ChatGPT Can Generate Violent and Sexual Images

    By
    AIWadmin

    Taiwanese AI Startups Showcase Innovations at Paris Tech Fair

    By
    AIWadmin

    Microsoft Expands China AI Footprint Through OpenAI Models

    By
    AIWadmin

    Bezos Predicts AI Will Create Labor Shortage, Not Job Losses

    By
    AIWadmin

    Anthropic plants flag in Seoul with new office and government pact on AI safety

    By
    AIWadmin
  • Articles

    AI Pioneer LeCun Warns of Industry Bubble, Calls Musk’s xAI a Misstep

    By
    AIWadmin

    xAI Launches Grok Imagine Video 1.5 with Faster Rendering and Audio

    By
    AIWadmin

    SpaceX Acquires AI Coding Startup Cursor in $60 Billion Stock Deal

    By
    AIWadmin

    AI Assistant Market Shifts as ChatGPT Drops Below 50% Share for First Time

    By
    AIWadmin

    Meta Loses Senior AI Product Leader Amid Enterprise Transformation Push

    By
    AIWadmin

    OpenAI Files for IPO, Set to Join Anthropic and SpaceX in Public Market Surge

    By
    AIWadmin
  • Spotlight

    New framework lets AI agents share silent thoughts for faster, cheaper reasoning

    By
    AIWadmin

    NVIDIA Jetson Gains Agentic AI with JetPack 7.2 and NemoClaw Framework

    By
    AIWadmin

    How OpenAI’s Algebraic Gambit Toppled a 50-Year-Old Number Theory Giant

    By
    AIWadmin

    Apple’s iOS 27 Siri Overhaul: A Strategic Pivot to AI Brokerage, Not Innovation

    By
    AIWadmin

    OpenAI Publishes Governance Framework as California and EU AI Laws Take Shape

    By
    AIWadmin

    Anthropic Unveils Dynamic Workflows for Claude Code: Parallel AI Agents at Scale

    By
    AIWadmin
  • Events
  • More
    • About
    • Services
    • Contact
  • 🔥
  • Alerts
  • Alignment
  • Explainability
  • Legal/Compliance
  • Startups
  • Safety
  • Chips
  • Mobility
  • Vision
  • Robotics
  • Research
  • Medical/Healthcare
Font ResizerAa
AIWatcherAIWatcher
  • Home
  • News
  • Articles
  • Spotlight
  • Events
  • About
Search
  • Quick Links
    • Home
    • News
    • Articles
    • Spotlight
    • Events
  • About AIWatcher
    • Mission
    • Services
    • Contact
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
News

Google’s Gemma 4 Gets a Cheat Code: 3x Faster by Skipping Tokens

AIWadmin
Last updated: May 23, 2026 12:05 am
AIWadmin
ByAIWadmin
Global AI news & information.
Follow:
Share
SHARE

The Slow Grind of Local AI Gets a Turbo Button

Let’s be honest: running a large language model on your own hardware has always felt like a compromise. You trade privacy for pokey performance, watching tokens dribble out one agonizing byte at a time. Google just threw a wrench in that tradeoff with its new Multi-Token Prediction (MTP) drafters for Gemma 4. This isn’t some hypothetical research paper. It’s a live, downloadable patch that promises to triple your inference speed with zero quality loss. If that holds up, it’s the biggest practical leap for edge AI since quantization.

Contents
The Slow Grind of Local AI Gets a Turbo ButtonA Generous Pivot or a Clever Trap?

The dirty secret of local inference is that your gaming GPU is mostly bored. It spends its cycles caching weights and waiting for slow VRAM to feed the beast. Google exploits that dead air with a tiny 74-million-parameter drafter model that guesses multiple future tokens in a single pass. The big model then verifies the guess batch in parallel. It is speculative execution for AI, and it brilliantly turns hardware latency into a throughput opportunity.

A Generous Pivot or a Clever Trap?

Google isn’t just open-sourcing the tech. They shifted the entire Gemma 4 license to Apache 2.0, a massive departure from their previous restrictive Gemma license. This move looks good, but cynical observers note it comes right as regulators are circling Big Tech’s walled gardens. By giving away the razor blades (the models), Google ensures developers stay hooked on the ecosystem’s handle (their frameworks and hardware support). Still, for the hobbyist or privacy-conscious developer, this is a win. You can grab the MTP-enabled models and run them via MLX, Ollama, or vLLM right now.

The real-world benchmarks are impressive but caveated. Pixel phones see the promised 3x boost on the small E4B model. Apple’s M4 Mac gets a 2.5x uplift on the massive 31B dense model. The company claims ‘zero quality degradation’ because the main model still validates the drafter’s guesses. That is mathematically sound, but it ignores the fact that a 3x faster bad answer is still a bad answer. The speed gain is real, but the underlying model’s alignment and factual accuracy remain their own problems.

Source: Arstechnica

TAGGED:Apache 2.0
Share This Article
Email Copy Link Print
ByAIWadmin
Follow:
Global AI news & information.
Previous Article Anthropic Lets Claude Agents Sleep On It. Here Is What That Means.
Next Article Samsung’s Trillion Dollar AI Gamble Is a Warning Bell for the Semiconductor Industry
Ad imageAd image

You Might Also Like

News

OpenAI’s Secret War on Goblins: Inside the Bizarre Codex Prompt That Bans a Fantasy Species

By
AIWadmin
News

Leaked Contract Reveals AGI Clause That Could Blow Up Microsoft OpenAI Alliance

By
AIWadmin
News

Wobbly Humanoid Robots Get Airport Trial, But Vague Videos Raise Doubts

By
AIWadmin
News

The Hustle Trap: TechCrunch Disrupt 2026 Bets Credibility Can Be Bought at Half Price

By
AIWadmin
AIWatcher
Facebook Twitter Youtube Linkedin Rss

Global AI News and Information
AIWatcher is your definitive source for AI updates worldwide, from Silicon Valley to Shanghai.
Our industry coverage keeps you in the loop with the latest news and trends shaping the future of AI.

Quick Links
  • News
  • Articles
  • Spotlight
  • Events
About Us
  • Mission
  • Services
  • Contact
  • Privacy Policy
  • Legal

© 2026 AIWatcher. All Rights Reserved.