Tags: Multimodal Robot Learning - Paper Library

$π_0$: A Vision-Language-Action Flow Model for General Robot Control

Published:11/1/2024

Vision-Language-Action ModelGeneralist Robot PoliciesMultimodal Robot LearningLLM-guided motion planning

This work introduces

π0

, combining a pretrained visionlanguage model with flow matching for precise, multirobot control, enabling zeroshot languagedriven dexterous tasks and enhanced generalization across diverse platforms.

05

ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation

Published:3/13/2024

3D Gaussian Splatting representationMultimodal Robot LearningMulti-Task Robotic ManipulationFuture Scene ReconstructionDynamic Semantic Propagation

ManiGaussian uses dynamic Gaussian splatting and future scene reconstruction to capture spatiotemporal dynamics for multitask robotic manipulation, outperforming stateoftheart methods by 13.1% in success rate on RLBench benchmarks.

05

UMI-on-Air: Embodiment-Aware Guidance for Embodiment-Agnostic Visuomotor Policies

LLM-guided motion planningMultimodal Robot LearningMulti-modal action representation and modelingLarge-Scale Robot Demonstration DatasetGeneralist Robot Policies

UMIonAir uses human demonstrations and EmbodimentAware Diffusion Policy (EADP) to guide visuomotor policies for constrained robot forms, enhancing adaptability, success, and robustness across different embodiments.

07

ActiveUMI: Robotic Manipulation with Active Perception from Robot-Free Human Demonstrations

Published:10/2/2025

Multimodal Robot LearningBimanual Dynamic Manipulation DemonstrationsLLM-guided motion planningActive Perception in Robotic ManipulationVR Teleoperation Data Collection

ActiveUMI integrates portable VR teleoperation and sensorized controllers to capture active egocentric perception, enabling precise humanrobot alignment for complex bimanual tasks with 70% success and strong generalization in novel scenarios.

011

Free Reads