Alibaba’s New Image Model Doubles Compression and Speeds Up Generation

Last updated: May 22, 2026 11:58 pm

AIWadmin

ByAIWadmin

Global AI news & information.

Follow:

Stronger Compression for Faster Training

Alibaba’s new Qwen-Image-2.0 model achieves a major efficiency gain by using a variational autoencoder (VAE) that compresses images sixteenfold in each direction, doubling the compression ratio of most open source models. Standard image models, such as FLUX.1-dev and HunyuanVideo, typically rely on eightfold spatial downsampling. Doubling the compression rate usually sacrifices fine detail, but the Qwen team overcame this by adding skip connections that preserve fine grained information around the VAE’s bottleneck layers. They also shaped the latent space during early training to capture semantically meaningful structures, giving the image transformer a cleaner workspace.

Contents

Stronger Compression for Faster Training Architectural Changes Speed Up Inference

Architectural Changes Speed Up Inference

The transformer at the core of Qwen-Image-2.0 processes both image and text tokens in a single stream, using frozen weights from Alibaba’s Qwen3-VL vision language model for text conditioning. The team made two structural modifications to prevent unstable activations: they simplified an internal scaling mechanism and stabilized the final layer normalization. These changes allow the model to generate high quality photorealistic images in as few as four generation steps, down from the typical 40 steps required by earlier systems. Alibaba’s technical report notes that the model’s outputs include portraits, animal close ups, nature scenes, and game screenshots with legible on screen text.

Source: The-Decoder

Apple CEO Warns of Price Hikes as AI Demand Strains Memory Chip Supply

Researchers Expose How ChatGPT Can Generate Violent and Sexual Images

Taiwanese AI Startups Showcase Innovations at Paris Tech Fair

Microsoft Expands China AI Footprint Through OpenAI Models

Bezos Predicts AI Will Create Labor Shortage, Not Job Losses

Anthropic plants flag in Seoul with new office and government pact on AI safety

AI Pioneer LeCun Warns of Industry Bubble, Calls Musk’s xAI a Misstep

xAI Launches Grok Imagine Video 1.5 with Faster Rendering and Audio

SpaceX Acquires AI Coding Startup Cursor in $60 Billion Stock Deal

AI Assistant Market Shifts as ChatGPT Drops Below 50% Share for First Time

Meta Loses Senior AI Product Leader Amid Enterprise Transformation Push

OpenAI Files for IPO, Set to Join Anthropic and SpaceX in Public Market Surge

New framework lets AI agents share silent thoughts for faster, cheaper reasoning

NVIDIA Jetson Gains Agentic AI with JetPack 7.2 and NemoClaw Framework

How OpenAI’s Algebraic Gambit Toppled a 50-Year-Old Number Theory Giant

Apple’s iOS 27 Siri Overhaul: A Strategic Pivot to AI Brokerage, Not Innovation

OpenAI Publishes Governance Framework as California and EU AI Laws Take Shape

Anthropic Unveils Dynamic Workflows for Claude Code: Parallel AI Agents at Scale

Alibaba’s New Image Model Doubles Compression and Speeds Up Generation

Stronger Compression for Faster Training

Architectural Changes Speed Up Inference

Quick Links

About Us

Stronger Compression for Faster Training

Architectural Changes Speed Up Inference

You Might Also Like

Quick Links

About Us