Free Reads

Sign in to view your remaining parses.
Tag Filter
Vision-Language-Action Model
GraspVLA: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data
Published:5/6/2025
Vision-Language-Action ModelSynthetic Data Grasping ModelLarge-Scale Synthetic Action DatasetAutoregressive Perception TasksChain-of-Thought Process
GraspVLA is a foundation model pretrained on a billion frames of synthetic action data, addressing reliance on realworld data. By creating the SynGrasp1B dataset and integrating autoregressive perception with flowmatching action generation, it achieves efficient zeroshot gene
03
WHOLEBODYVLA: TOWARDS UNIFIED LATENT VLA FOR WHOLE-BODY LOCO-MANIPULATION CONTROL
Published:12/11/2025
Whole-Body Humanoid Robot ControlVision-Language-Action ModelRobotic Action LearningAction Learning from Low-Cost VideosLoco-Manipulation-Oriented Reinforcement Learning
This study presents , a unified latent visionlanguageaction framework enhancing humanoid robots' performance in locomanipulation tasks. It learns from lowcost egocentric videos and employs a tailored reinforcement learning policy, achieving a 21.3% performance b
01
$π^{*}_{0.6}$: a VLA That Learns From Experience
Published:11/19/2025
Vision-Language-Action ModelRL Training for Large Language ModelsExperience-Based Reinforcement LearningRobotic Data Collection and OptimizationAdvantage-Conditioned Policies
The study presents RECAP, a method for training VisionLanguageAction models through realworld learning. The π0.6π^{}{0.6} model, pretrained using offline reinforcement learning, demonstrates significant performance improvements on various tasks, such as laundry folding and es
014
SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control
Published:11/11/2025
Motion Tracking Foundation ModelNatural Humanoid ControlLarge-Scale Motion Capture DatasetReal-Time Motion PlanningVision-Language-Action Model
The SONIC framework scales model capacity, data, and compute for natural humanoid control, utilizing diverse motioncapture data for dense supervision. It features realtime motion planning and multiinterface support, demonstrating significant performance gains from scaling.
08