Published · May 2026

SABER.

A Scalable Action-Based Embodied Dataset for Real-World VLA Adaptation — the first high-fidelity retail robotics action dataset built from natural human behavior, not teleoperation.

The Core Claim

Domain-specific robot deployment is fundamentally a data problem. High-fidelity naturalistic human behavior — systematically captured and retargeted — is a scalable foundation for robot adaptation. No robot in the loop required.

44.8K
Training Samples
100+
Hours Captured
2.19×
Improvement
Resources

Need the full 44.8K corpus or custom capture?

Contact Sales
Why Retail Demands Its Own Data
Modern VLAs like GR00T N1.6 achieve near-zero success on retail tasks out of the box — not because the model is weak, but because the retail domain is entirely absent from training data.

Distinct Skill Distribution

Articulated object interaction, multi-height shelf reaching, basket loading, floor retrieval, and context-dependent placement — all repeated across hundreds of SKUs in layouts no lab can replicate.

Long-Tail Scene Variation

Dense shelves, active restocking, occlusions, varied lighting, reflective packaging, and product deformability create real-world complexity that generic datasets cannot approximate.

Repetition Matters

A model must see skill families repeatedly across contexts — grasping bottles from different shelf heights, opening fridges from varied approach angles — to achieve reliable deployment.

Key Results at a Glance
2.19×
Improvement over fine-tuning baselines on RoboBenchMart
29.3%
Mean success rate across all 10 retail manipulation tasks
91%
Average fridge task success — up from 43% baseline
100%
Non-robot data — entire dataset captured from human video alone
44.8K
Total Samples
100+
Capture Hours
3
Action Streams
10
Eval Tasks
Three Complementary Action Streams
From the same dual-camera in-store captures, three distinct supervision signals are derived — each encoding a different level of kinematic abstraction.
Stream 1

LAPA Latent Actions

25K

Embodiment-agnostic motion tokens derived via inverse-dynamics encoding from egocentric video. Captures whole-arm motion, reach trajectories, and grasping dynamics without robot joint labels.

Egocentric GoPro
Stream 2

Dexterous Hand Retargets

18.6K

21-point hand landmarks estimated, human-corrected frame-by-frame, then retargeted to robot joint space via Dex-Retargeting. Provides explicit finger-level precision supervision.

Egocentric GoPro
Stream 3

Whole-Body Retargets

1.2K

SMPL body parameters estimated from the 360° ALIA view, human-corrected, and retargeted to the Unitree G1 humanoid. Provides torso-arm-leg coordination for floor retrieval and extended reach.

Exocentric ALIA 360°
From Store Footage to Robot Training
SABER is constructed from a dual-stream capture architecture — egocentric GoPro + exocentric ALIA 360° — across multiple real grocery stores.
1

In-Store Capture

100+ hours across multiple real grocery stores with head-mounted GoPro + DreamVu ALIA 360°

2

Action Extraction

LAPA encoding, hand pose estimation, and SMPL body estimation with human QC annotation

3

Robot Retargeting

Dex-Retargeting to robot hand joint space + SMPL-to-Unitree G1 whole-body retargeting

4

VLA Post-Training

Shared-backbone multi-task training on GR00T N1.6 with flow-matching objective

Capture Sessions & Task Annotations
Annotated in-store capture footage from the SABER dataset — showing retail manipulation tasks with action labels and multi-scene diversity.
RoboBenchMart Results
SABER-MM post-training on GR00T N1.6 evaluated across 10 retail manipulation tasks spanning fridge, board-to-board, floor pick, and basket pick categories.
29.3%
13.4%
91%
43%
17%
3%
2.19×
Mean improvement over baseline
SABER-MM vs. RoboBenchMart fine-tuning only
SABER-MM
Baseline
Task Category Baseline (RBM FT) SABER-MM Change
fridge (avg open + close) Fridge 0.43 0.91 +112%
board_to_board_duff Board 0.10 0.10
board_to_board_nestle Board 0.02 0.02
board_to_board_vanish Board 0.02 0.11 +450%
pick_from_floor_beans Floor 0.04 0.17 +325%
pick_from_floor_slam Floor 0.02 0.17 +750%
pick_to_basket_fanta Basket 0.08 0.19 +138%
pick_to_basket_nivea Basket 0.08 0.21 +163%
pick_to_basket_stars Basket 0.12 0.14 +17%
Mean (all tasks) 0.134 0.293 +119%
SABER-MM Data Composition
The post-training corpus combines SABER's three streams with robot-native anchor data and task-aligned demonstrations — totaling ~52.1K samples.
52.1K
Total Samples
SABER — LAPA Latent Actions
25K samples · Egocentric video
48.0%
SABER — Hand Retargets
18.6K samples · Dex-Retargeting
35.7%
SABER — Body Retargets
1.2K samples · Unitree G1
2.3%
NVIDIA Robot Data
4.8K samples · Anchor signal
9.2%
RoboBenchMart
2.5K samples · Task-aligned
4.8%
What SABER Demonstrates
Finding 01

Human Video Scales Where Teleoperation Can't

SABER demonstrates that high-fidelity naturalistic human behavior, systematically captured and retargeted, is a viable and scalable foundation for domain-specific robot adaptation — without a robot in the loop.

Finding 02

Three Streams Are Complementary

LAPA tokens capture whole-arm trajectory, Dex-Retargeting provides finger-level precision, and body retargets supply torso-arm-leg coordination. Together they provide non-overlapping kinematic information.

Finding 03

Robot-Native Anchor Stabilizes Training

The 4,800-sample robot-native anchor data proved necessary to stabilize early training even at SABER's scale, suggesting general manipulation signal matters for robust convergence.

Finding 04

Task Progress Beyond Binary Success

SABER-MM teaches models to progress further through each task sequence — mean P≥2/3 of 0.445 vs 0.278 baseline — indicating reaching and grasping are well-learned while placement remains the frontier.

Cite This Work
@article{dreamvu2026saber,
  title   = {SABER: A Scalable Action-Based Embodied Dataset
             for Real-World VLA Adaptation},
  author  = {Menga, Narsimha and Sakurikar, Parikshit and Rouhi, Amirreza
             and Reddy, Satya Sai and Govil, Anirudh and Chittajallu, Sri Harsha
             and Aggarwal, Rajat and Namboodiri, Anoop and Reddi, Sashi},
  year    = {2026},
  month   = {May},
  note    = {DreamVu Inc.},
  url     = {https://dreamvu.ai/saber}
}

Ready to Build the Data Layer for Retail Robots?

The SABER-10K subset is available now on Hugging Face. Full dataset and custom capture at dreamvu.ai/saber.