Community Release · Agibot World Challenge 2026

Retail-VLA-10K
Dataset by DreamVu

A large-scale egocentric video dataset of human manipulation actions in real retail environments, curated by DreamVu for robot training. Formatted for LeRobot v2.1 and free for the research community.

↓ Download on Hugging Face Why use this dataset →

10,123

Episodes

across 11 skills

3.09M

Total Frames

of real-world video

Skill Categories

manipulation tasks

30fps

Video Rate

640×480 H.264

For Challenge Participants

Why use Retail-VLA-10K for
the Agibot World Challenge?

The official Reasoning2Action track includes retail operations as a core task. Here is why this dataset gives your team a concrete edge.

01 ·· DIRECT TASK OVERLAP

Retail Skills That Match the Challenge Tasks

The challenge tasks stock_and_straighten_shelf and take_wrong_item_shelf map directly to skills in this dataset — Placing on Shelf (1,267 episodes), Picking Up Item (1,550 episodes), Reaching (1,593 episodes), and Grasping (1,590 episodes). This is task-aligned demonstration data, not generic manipulation.

02 ·· SIM-TO-REAL

Real-World Data to Complement AgiBot's Sim Dataset

The official Reasoning2Action dataset is simulation-based (Genie Sim 3.0). Retail-VLA-10K is curated from real retail environments — real lighting, real product diversity, real shelf clutter. Using both together directly addresses the Sim2Real gap that the challenge is built around.

03 ·· ZERO FRICTION

Same LeRobot v2.1 Format as the Official Dataset

The official AgiBot challenge dataset uses the LeRobot v2.1 layout (meta / data / videos). Retail-VLA-10K uses the exact same structure. There is no reformatting, no custom dataloaders — you can mix and augment your training set immediately and spend time on model development instead.

04 ·· SCALE

10,000+ Episodes Ready to Train On

With 10,000+ episodes and 3M+ frames, this dataset is large enough to meaningfully pretrain or fine-tune a VLA policy — not just evaluate one. Teams that start with more high-quality demonstration data have a measurable head start on generalization.

05 ·· ACTION ENCODING

LAPA Latent Actions — No Proprioception Required

Actions are encoded using LAPA (Latent Action Pretraining from Videos) — a codebook-based quantization model. You do not need proprioceptive robot data to benefit. LAPA lets you use this human-curated video directly for latent action pretraining, hardware-agnostic and compatible with the G2 robot setup.

06 ·· OPEN ACCESS

Free, Immediate, No Paperwork

Released under CC BY-NC 4.0 — no access requests, no waiting, no gating. Download and start training today. The only condition is non-commercial use, which covers all research and challenge participation.

Agibot World Challenge 2026 · Track 1

Reasoning2Action — Task Alignment

See how Retail-VLA-10K maps to the 10 official challenge tasks in the Reasoning2Action track.

Track 1 evaluates models across 10 progressively challenging manipulation tasks, ranging from basic to complex — including retail operations, logistics sorting, and long-horizon skills.

Retail-VLA-10K directly supports the two retail-specific tasks and provides the core manipulation primitives — grasping, reaching, placing — that underpin performance across the entire track.

Bottom line: If your team is training for Track 1, you need exposure to real retail manipulation at scale. The official dataset is simulation-only. Ours fills that gap — same format, real environments, task-matched skills.

Challenge Task Dataset Match

stock_and_straighten_shelf Direct Match

take_wrong_item_shelf Direct Match

sorting_packages

sorting_packages_continuous

place_block_into_box

hold_pot

clean_the_desktop General

open_door General

pour_workpiece General

scoop_popcorn General

Dataset Composition

11 Retail Manipulation Skills

Every episode is captured from a first-person egocentric perspective, designed to match the natural viewpoint of a deployed robot.

Skill	Dataset ID	Episodes	Frames
Grasping	manipulation_grasping	1,590	484,619
Reaching	manipulation_reaching	1,593	467,868
Holding	manipulation_holding	1,558	488,087
Picking Up Item	manipulation_picking_up_item	1,550	473,327
Cart Pushing	manipulation_cart_pushing	1,180	354,508
Placing on Shelf	manipulation_placing_item_on_shelf	1,267	395,428
Placing in Cart	manipulation_placing_item_in_cart	423	137,514
Lifting	manipulation_lifting	445	137,354
Object Manipulation	manipulation_object_manipulation	188	55,303
Placing in Basket	manipulation_placing_item_in_basket	153	47,938
Holding Item	manipulation_holding_item	176	53,760
Total	11 skills	10,123	3,095,706

LeRobot v2.1 LAPA Latent Actions Codebook Size 8 · Seq Len 4 Real Retail Environments Egocentric Capture H.264 · 640×480 · 30fps CC BY-NC 4.0

Technical Specifications

Format & Structure

Plug-and-play with LeRobot v2.1 pipelines — the same format as the official Agibot challenge dataset.

🎥

Video

640×480 H.264 video at 30 fps. Each skill is a self-contained sub-directory with meta/, data/, and videos/ folders matching the LeRobot v2.1 layout exactly.

⚡

Action Encoding

Actions encoded via LAPA (Latent Action Pretraining from Videos). 4 latent action indices per frame, codebook size 8, sequence length 4. No proprioceptive labels required.

🏷️

Annotations

Episodes include language task annotations and standard LeRobot metadata — episodes.jsonl, episodes_stats.jsonl, info.json, and tasks.jsonl.

📂

Codebase Version

Packaged for LeRobot v2.1. Episode data stored as .parquet files per chunk. Identical structure to the official Agibot Reasoning2Action dataset — mix them directly.

👁️

Capture Perspective

First-person (egocentric) viewpoint throughout — matching the camera placement of humanoid and mobile manipulation robots. No viewpoint mismatch to compensate for.

📜

License

CC BY-NC 4.0 — free for research and challenge participation. No access requests, no waiting. Attribution to DreamVu required. Non-commercial use only.

Ready to close the Sim2Real gap?

Download Retail-VLA-10K on Hugging Face and start training alongside the official Agibot dataset today.

↓ Download on Hugging Face Talk to DreamVu →

Released under CC BY-NC 4.0 · Curated by DreamVu · dreamvu.ai

Retail-VLA-10K Dataset by DreamVu

Why use Retail-VLA-10K forthe Agibot World Challenge?