Community Release · Agibot World Challenge 2026

Retail-VLA-10K
Dataset by DreamVu

A large-scale egocentric video dataset of human manipulation actions in real retail environments, curated by DreamVu for robot training. Formatted for LeRobot v2.1 and free for the research community.

10,123
Episodes
across 11 skills
3.09M
Total Frames
of real-world video
11
Skill Categories
manipulation tasks
30fps
Video Rate
640×480 H.264

For Challenge Participants

Why use Retail-VLA-10K for
the Agibot World Challenge?

The official Reasoning2Action track includes retail operations as a core task. Here is why this dataset gives your team a concrete edge.

01 ·· DIRECT TASK OVERLAP

Retail Skills That Match the Challenge Tasks

The challenge tasks stock_and_straighten_shelf and take_wrong_item_shelf map directly to skills in this dataset — Placing on Shelf (1,267 episodes), Picking Up Item (1,550 episodes), Reaching (1,593 episodes), and Grasping (1,590 episodes). This is task-aligned demonstration data, not generic manipulation.

02 ·· SIM-TO-REAL

Real-World Data to Complement AgiBot's Sim Dataset

The official Reasoning2Action dataset is simulation-based (Genie Sim 3.0). Retail-VLA-10K is curated from real retail environments — real lighting, real product diversity, real shelf clutter. Using both together directly addresses the Sim2Real gap that the challenge is built around.

03 ·· ZERO FRICTION

Same LeRobot v2.1 Format as the Official Dataset

The official AgiBot challenge dataset uses the LeRobot v2.1 layout (meta / data / videos). Retail-VLA-10K uses the exact same structure. There is no reformatting, no custom dataloaders — you can mix and augment your training set immediately and spend time on model development instead.

04 ·· SCALE

10,000+ Episodes Ready to Train On

With 10,000+ episodes and 3M+ frames, this dataset is large enough to meaningfully pretrain or fine-tune a VLA policy — not just evaluate one. Teams that start with more high-quality demonstration data have a measurable head start on generalization.

05 ·· ACTION ENCODING

LAPA Latent Actions — No Proprioception Required

Actions are encoded using LAPA (Latent Action Pretraining from Videos) — a codebook-based quantization model. You do not need proprioceptive robot data to benefit. LAPA lets you use this human-curated video directly for latent action pretraining, hardware-agnostic and compatible with the G2 robot setup.

06 ·· OPEN ACCESS

Free, Immediate, No Paperwork

Released under CC BY-NC 4.0 — no access requests, no waiting, no gating. Download and start training today. The only condition is non-commercial use, which covers all research and challenge participation.


Agibot World Challenge 2026 · Track 1

Reasoning2Action — Task Alignment

See how Retail-VLA-10K maps to the 10 official challenge tasks in the Reasoning2Action track.

Track 1 evaluates models across 10 progressively challenging manipulation tasks, ranging from basic to complex — including retail operations, logistics sorting, and long-horizon skills.

Retail-VLA-10K directly supports the two retail-specific tasks and provides the core manipulation primitives — grasping, reaching, placing — that underpin performance across the entire track.

Bottom line: If your team is training for Track 1, you need exposure to real retail manipulation at scale. The official dataset is simulation-only. Ours fills that gap — same format, real environments, task-matched skills.
Challenge Task Dataset Match
stock_and_straighten_shelf Direct Match
take_wrong_item_shelf Direct Match
sorting_packages Primitives
sorting_packages_continuous Primitives
place_block_into_box Primitives
hold_pot Primitives
clean_the_desktop General
open_door General
pour_workpiece General
scoop_popcorn General

Dataset Composition

11 Retail Manipulation Skills

Every episode is captured from a first-person egocentric perspective, designed to match the natural viewpoint of a deployed robot.

Skill Dataset ID Episodes Frames Volume
Grasping manipulation_grasping 1,590 484,619
Reaching manipulation_reaching 1,593 467,868
Holding manipulation_holding 1,558 488,087
Picking Up Item manipulation_picking_up_item 1,550 473,327
Cart Pushing manipulation_cart_pushing 1,180 354,508
Placing on Shelf manipulation_placing_item_on_shelf 1,267 395,428
Placing in Cart manipulation_placing_item_in_cart 423 137,514
Lifting manipulation_lifting 445 137,354
Object Manipulation manipulation_object_manipulation 188 55,303
Placing in Basket manipulation_placing_item_in_basket 153 47,938
Holding Item manipulation_holding_item 176 53,760
Total 11 skills 10,123 3,095,706
LeRobot v2.1 LAPA Latent Actions Codebook Size 8 · Seq Len 4 Real Retail Environments Egocentric Capture H.264 · 640×480 · 30fps CC BY-NC 4.0

Technical Specifications

Format & Structure

Plug-and-play with LeRobot v2.1 pipelines — the same format as the official Agibot challenge dataset.

🎥

Video

640×480 H.264 video at 30 fps. Each skill is a self-contained sub-directory with meta/, data/, and videos/ folders matching the LeRobot v2.1 layout exactly.

Action Encoding

Actions encoded via LAPA (Latent Action Pretraining from Videos). 4 latent action indices per frame, codebook size 8, sequence length 4. No proprioceptive labels required.

🏷️

Annotations

Episodes include language task annotations and standard LeRobot metadata — episodes.jsonl, episodes_stats.jsonl, info.json, and tasks.jsonl.

📂

Codebase Version

Packaged for LeRobot v2.1. Episode data stored as .parquet files per chunk. Identical structure to the official Agibot Reasoning2Action dataset — mix them directly.

👁️

Capture Perspective

First-person (egocentric) viewpoint throughout — matching the camera placement of humanoid and mobile manipulation robots. No viewpoint mismatch to compensate for.

📜

License

CC BY-NC 4.0 — free for research and challenge participation. No access requests, no waiting. Attribution to DreamVu required. Non-commercial use only.

Ready to close the Sim2Real gap?

Download Retail-VLA-10K on Hugging Face and start training alongside the official Agibot dataset today.

↓ Download on Hugging Face Talk to DreamVu →
Released under CC BY-NC 4.0 · Curated by DreamVu · dreamvu.ai