A robot produced by AgiBot performs clothes folding. Photo: Courtesy of AgiBot
Shanghai-based robotics start-up AgiBot has launched Genie Envisioner (GE), a real world-oriented, unified video-generating platform that integrates prediction, policy learning, and neural simulation, the company told the Global Times on Thursday.
The platform, operated within a single video-generative framework, is the first of its kind in the industry, said the company.
"World models for robotics should learn, act, and evaluate in one loop. We're releasing Genie Envisioner: a unified, video generative platform that integrates prediction, policy learning, and neural simulation together," the company said on social media X last week.
GE is a powerful foundation for building general-purpose, instruction-driven embodied intelligence, said AgiBot, adding that it will make all code, models, and benchmarks related to the platform open source.
GE's vision-centric world modeling has opened up a new technical path for robot learning, according to AgiBot. Its release marks a shift in robotics from passive execution to active "imagine-verify-act."
The research team will expand its sensor modalities to support full-body mobility and human-robot collaboration, continuously promoting the practical application of intelligent manufacturing and service robots, said the company.
Existing systems for training robots rely on fragmented stages of data collection, training, and evaluation, while GE aims to integrate these processes into a unified platform.
Its core, GE-Base, is a large-scale, instruction-driven video diffusion model capturing the spatial, temporal, and semantic dynamics of real-world tasks. It has been trained on about 3,000 hours of video language-paired data spanning more than 1 million real-world robotic manipulation episodes, establishing a "mapping from language instructions to an embodied visual space, capturing the essence of robotic manipulation by modeling the spatial, temporal, and semantic regularities of real-world interactions," it added.
Extensive real-world tests highlight the system's superior performance, said the company. This platform allows robots to achieve better task planning such as folding clothes and conveyor belt sorting. Additionally, the platform's high-fidelity physical simulation allows robots to predict environmental interaction changes.
At the recent World Robot Conference in Beijing, robots equipped with GE completed complex tasks like making sandwiches, pouring tea, and using microwave ovens, with a success rate exceeding the industry average, AgiBot said.
The breakthrough represented by AgiBot's GE platform is expected to form a strong foundation for general-purpose, instruction-driven embodied intelligence, Zhong Xiangyun, a humanoid robot industry observer, told the Global Times on Thursday.
The rollout of GE not only showcases the Chinese robotics company's technological prowess but also paves the way for more efficient and scalable robot development. It heralds a new era in robotics, where robots can better understand and interact with the complex physical world around them, Zhong said.