Kuaishou unveils and open-sources next-gen multimodal AI model Keye-VL-671B-A37B
Source: Saiyp | Date: 2025-11-29 20:08:00
Kuaishou has officially launched Keye-VL-671B-A37B, its next-generation flagship multimodal model, and simultaneously open-sourced its code. Touted as “good at seeing and thinking,” the model delivers top-tier performance in visual understanding, video analysis, and mathematical reasoning—solidifying Kuaishou’s position in AI innovation.
Built on the DeepSeek-V3-Terminus large language model and paired with the vision encoder KeyeViT (an evolution of Keye-VL-1.5), Keye-VL-671B-A37B features systematic upgrades in visual perception, cross-modal alignment, and complex reasoning. Trained on 300 billion high-quality, filtered tokens across a three-stage pre-training process—initial alignment, full-parameter pre-training, and high-quality annealing—the model achieves strong accuracy and stability across everyday and complex tasks.
Its post-training pipeline includes supervised fine-tuning, cold-start optimization, and reinforcement learning, with tasks spanning visual QA, chart interpretation, and rich-text OCR.
Looking ahead, Kuaishou plans to evolve Keye-VL into an intelligent multimodal Agent capable of autonomous tool use, complex problem-solving, and multi-step reasoning. Future work will focus on “thinking with images” and “thinking with videos”—enabling the model not just to perceive but to reason deeply over visual content.
By combining foundational model strength with advanced agent capabilities, Kuaishou aims to push the boundaries of general, reliable, and reasoning-capable multimodal AI—ushering in a new era of intelligent systems.