Apple Research unveils UniGen1.5: A unified multimodal AI model for understanding, generating, and editing images

2025-12-20 21:55:00+08

Apple’s research team has unveiled UniGen1.5, its latest multimodal AI model, marking a significant leap in image processing technology. Unlike conventional approaches that handle image understanding, generation, and editing as separate tasks, UniGen1.5 integrates all three capabilities into a single, unified framework—dramatically improving efficiency and output quality.

A key innovation in UniGen1.5 is its “edit-instruction alignment” technique for image editing. Instead of directly modifying pixels, the model first generates a detailed textual description based on the original image and the user’s editing instruction. This “think-before-drawing” approach enables more accurate interpretation and execution of complex editing requests.

The model also advances reinforcement learning for vision tasks. Apple’s team developed a unified reward system that jointly optimizes both image generation and editing during training. This addresses the long-standing challenge of inconsistent quality metrics across editing tasks, allowing UniGen1.5 to maintain high performance across diverse visual applications.

In standardized benchmarks, UniGen1.5 demonstrates state-of-the-art results:

  • 0.89 on GenEval
  • 86.83 on DPG-Bench
  • 4.31 on ImgEdit

These scores outperform leading open models like BAGEL, BLIP-3o, and OminiGen2, and rival proprietary systems such as GPT-Image-1.

Despite its strengths, the researchers acknowledge limitations. UniGen1.5 occasionally generates inaccurate text within images and can exhibit attribute drift in specific editing scenarios—such as shifts in animal fur texture or color. The team says future work will focus on refining these aspects to enhance reliability and fidelity.

With UniGen1.5, Apple signals a strategic push toward end-to-end multimodal AI systems capable of sophisticated, human-aligned visual reasoning and creation.

Return to News List