SABER.
A Scalable Action-Based Embodied Dataset for Real-World VLA Adaptation — the first high-fidelity retail robotics action dataset built from natural human behavior, not teleoperation.
Domain-specific robot deployment is fundamentally a data problem. High-fidelity naturalistic human behavior — systematically captured and retargeted — is a scalable foundation for robot adaptation. No robot in the loop required.
Watch Videos
In-store capture demos
arXiv
Research Paper
Download PDF
Full research paper
Dataset
SABER-10K on Hugging Face
Benchmark Results
RoboBenchMart evaluation
Need the full 44.8K corpus or custom capture?
Contact SalesComplete SABER Capture Pipeline
Synchronized dual-stream footage: egocentric video, 360° exocentric view, hand landmarks, body skeleton, and SMPL mesh — derived simultaneously from real in-store human actions.
Distinct Skill Distribution
Articulated object interaction, multi-height shelf reaching, basket loading, floor retrieval, and context-dependent placement — all repeated across hundreds of SKUs in layouts no lab can replicate.
Long-Tail Scene Variation
Dense shelves, active restocking, occlusions, varied lighting, reflective packaging, and product deformability create real-world complexity that generic datasets cannot approximate.
Repetition Matters
A model must see skill families repeatedly across contexts — grasping bottles from different shelf heights, opening fridges from varied approach angles — to achieve reliable deployment.
LAPA Latent Actions
Embodiment-agnostic motion tokens derived via inverse-dynamics encoding from egocentric video. Captures whole-arm motion, reach trajectories, and grasping dynamics without robot joint labels.
Dexterous Hand Retargets
21-point hand landmarks estimated, human-corrected frame-by-frame, then retargeted to robot joint space via Dex-Retargeting. Provides explicit finger-level precision supervision.
Whole-Body Retargets
SMPL body parameters estimated from the 360° ALIA view, human-corrected, and retargeted to the Unitree G1 humanoid. Provides torso-arm-leg coordination for floor retrieval and extended reach.
In-Store Capture
100+ hours across multiple real grocery stores with head-mounted GoPro + DreamVu ALIA 360°
Action Extraction
LAPA encoding, hand pose estimation, and SMPL body estimation with human QC annotation
Robot Retargeting
Dex-Retargeting to robot hand joint space + SMPL-to-Unitree G1 whole-body retargeting
VLA Post-Training
Shared-backbone multi-task training on GR00T N1.6 with flow-matching objective
Retail Task Cycles
Pushing trolleys, packing goods, arranging goods, opening doors, inspecting labels, and handling baskets.
Retail Task Cycles
Placing and moving foods, scooping loose goods, inspecting deformable packets, carrying multiple goods, inspecting fruits, closing doors, and placing goods.
SABER-MM vs. RoboBenchMart fine-tuning only
| Task | Category | Baseline (RBM FT) | SABER-MM | Change |
|---|---|---|---|---|
| fridge (avg open + close) | Fridge | 0.43 | 0.91 | +112% |
| board_to_board_duff | Board | 0.10 | 0.10 | — |
| board_to_board_nestle | Board | 0.02 | 0.02 | — |
| board_to_board_vanish | Board | 0.02 | 0.11 | +450% |
| pick_from_floor_beans | Floor | 0.04 | 0.17 | +325% |
| pick_from_floor_slam | Floor | 0.02 | 0.17 | +750% |
| pick_to_basket_fanta | Basket | 0.08 | 0.19 | +138% |
| pick_to_basket_nivea | Basket | 0.08 | 0.21 | +163% |
| pick_to_basket_stars | Basket | 0.12 | 0.14 | +17% |
| Mean (all tasks) | 0.134 | 0.293 | +119% |
Human Video Scales Where Teleoperation Can't
SABER demonstrates that high-fidelity naturalistic human behavior, systematically captured and retargeted, is a viable and scalable foundation for domain-specific robot adaptation — without a robot in the loop.
Three Streams Are Complementary
LAPA tokens capture whole-arm trajectory, Dex-Retargeting provides finger-level precision, and body retargets supply torso-arm-leg coordination. Together they provide non-overlapping kinematic information.
Robot-Native Anchor Stabilizes Training
The 4,800-sample robot-native anchor data proved necessary to stabilize early training even at SABER's scale, suggesting general manipulation signal matters for robust convergence.
Task Progress Beyond Binary Success
SABER-MM teaches models to progress further through each task sequence — mean P≥2/3 of 0.445 vs 0.278 baseline — indicating reaching and grasping are well-learned while placement remains the frontier.
@article{dreamvu2026saber,
title = {SABER: A Scalable Action-Based Embodied Dataset
for Real-World VLA Adaptation},
author = {Menga, Narsimha and Sakurikar, Parikshit and Rouhi, Amirreza
and Reddy, Satya Sai and Govil, Anirudh and Chittajallu, Sri Harsha
and Aggarwal, Rajat and Namboodiri, Anoop and Reddi, Sashi},
year = {2026},
month = {May},
note = {DreamVu Inc.},
url = {https://dreamvu.ai/saber}
}