Free Reads
Sign in to view your remaining parses.
Tag Filter
Multimodal Robot Learning
$π_0$: A Vision-Language-Action Flow Model for General Robot Control
Published:11/1/2024
Vision-Language-Action ModelGeneralist Robot PoliciesMultimodal Robot LearningLLM-guided motion planning
This work introduces , combining a pretrained visionlanguage model with flow matching for precise, multirobot control, enabling zeroshot languagedriven dexterous tasks and enhanced generalization across diverse platforms.
05
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic
Manipulation
Published:3/13/2024
3D Gaussian Splatting representationMultimodal Robot LearningMulti-Task Robotic ManipulationFuture Scene ReconstructionDynamic Semantic Propagation
ManiGaussian uses dynamic Gaussian splatting and future scene reconstruction to capture spatiotemporal dynamics for multitask robotic manipulation, outperforming stateoftheart methods by 13.1% in success rate on RLBench benchmarks.
05
UMI-on-Air: Embodiment-Aware Guidance for Embodiment-Agnostic Visuomotor
Policies
LLM-guided motion planningMultimodal Robot LearningMulti-modal action representation and modelingLarge-Scale Robot Demonstration DatasetGeneralist Robot Policies
UMIonAir uses human demonstrations and EmbodimentAware Diffusion Policy (EADP) to guide visuomotor policies for constrained robot forms, enhancing adaptability, success, and robustness across different embodiments.
07
ActiveUMI: Robotic Manipulation with Active Perception from Robot-Free
Human Demonstrations
Published:10/2/2025
Multimodal Robot LearningBimanual Dynamic Manipulation DemonstrationsLLM-guided motion planningActive Perception in Robotic ManipulationVR Teleoperation Data Collection
ActiveUMI integrates portable VR teleoperation and sensorized controllers to capture active egocentric perception, enabling precise humanrobot alignment for complex bimanual tasks with 70% success and strong generalization in novel scenarios.
011