Loading market data...

Meet Qwen-RobotSuite: Three Embodied AI Models for VLA Manipulation, Video World Modeling, and Navigation

MarkTechPostJune 16, 2026 at 4:51 PM

We break down Qwen-RobotSuite, the Qwen team's three new embodied AI models. We cover RobotManip, a Vision-Language-Action model built on Qwen3. 5-4B for manipulation. We cover RobotWorld, a language-conditioned video world model with a 60-layer MMDiT.

We cover RobotNav, a navigation model built on Qwen3-VL across 2B, 4B, and 8B sizes. We walk through the architecture, data pipelines, and benchmark results for each.

This is a summary. For the full story, read the original article at MarkTechPost.

Original source: MarkTechPost

Related Articles

GLM-5.2: Built for Long-Horizon Tasks

From the Hugging Face Hub to robot hardware with Strands Agents and LeRobot

Predicting model behavior before release by simulating deployment