The world's leading AI researchers are building world models and spatial intelligence. They need high-fidelity 3D training data from real environments. DreamVu's omnidirectional capture platform delivers it at scale.
Fei-Fei Li — the Stanford professor who created ImageNet and catalyzed the deep learning revolution — has made spatial intelligence the focus of her latest company, World Labs. Her thesis: AI must learn to perceive, reason about, and act in three-dimensional space. Not from text. Not from flat images. From spatially rich, real-world data.
"Spatial intelligence is the next major capability AI needs to develop. It's how humans and animals make sense of the world — and it's what's missing from today's AI systems."
Yann LeCun — Meta's Chief AI Scientist and Turing Award winner — has been equally direct. He argues that the path to truly intelligent machines runs through world models: internal representations of how the physical world works, learned from observation, not text.
"A system trained on text will never understand the physical world. You need world models — learned from video and sensory data — that can predict what happens next."
Both visions share a common prerequisite: massive amounts of high-fidelity, spatially aware, real-world 3D data. And that's exactly what doesn't exist today — at least, not at the scale or quality these models demand.
VLA (Vision-Language-Action) models can't learn physics, spatial relationships, or manipulation skills from 2D images and text. They need dense 3D captures of real environments with real people performing real tasks.
Billions have been poured into model architectures — GR00T, RT-2, Octo, π₀ — but the training data barely exists. Open-source robotics datasets are small, narrow-FOV, and lack the 3D spatial richness these models require.
Our patented omnidirectional 3D capture technology produces exactly the data that world models and spatial AI systems need — 360° depth + RGB, 3D occupancy maps, semantic labels, and skill segmentation at scale.
Humanoids need egocentric views (what they see) and exocentric views (how they appear to others). Traditional capture misses half the picture. DreamVu's 360° capture gives you both — simultaneously.
Physical AI models need vision + language + action data together. Most datasets provide vision only — leaving teams to stitch together incomplete signals. DreamVu delivers all three, synchronized.
Humanoids trained in simulation fail when deployed in real environments. DreamVu captures the real world in formats that translate directly into Isaac Sim and back — closing the sim-to-real loop.
Hardware-synced egocentric + 360° exocentric cameras with full RGB + depth in real environments
AI-assisted (SAM2, Grounding DINO) + human QA delivers vision, language, and action labels — 10× faster than traditional 3D annotation
3D Gaussian Splatting creates photorealistic scenes with all annotations preserved — ready for simulation conversion
Automated USD export with physics properties for NVIDIA Isaac Sim and Unreal Engine 5.3+
1,000+ frames/hour with domain randomization — all modalities and skill transfer demos preserved
Continuous verification: sim-to-real transfer rates, manipulation success, and skill transfer effectiveness
Developed from breakthrough research at IIIT Hyderabad (published at CVPR 2016) and refined over 8 years of production deployment in autonomous mobile robots, UV disinfection systems, and industrial applications worldwide.
Alia is the only camera that combines full 360° coverage with long-range, high-resolution 3D depth sensing in a single compact unit — with on-board edge AI processing. This proprietary technology creates a defensible moat: we capture spatial data that no other company can replicate.
Traditional cameras see 60–90°. In a warehouse or retail environment, most action happens outside that cone. Alia captures everything in all directions simultaneously.
When multiple humans and robots demonstrate tasks throughout an environment, one Alia captures all demonstrations happening anywhere in the space — no repositioning required.
The 360° coverage provides ideal input for photorealistic 3D reconstruction. All multimodal annotations propagate automatically from 2D frames to the 3D scene.
Protected omnidirectional 3D vision technology with 8+ years of production deployment. A defensible competitive advantage that ensures unique data capture capabilities.
A grocery store contains more distinct manipulation tasks per square foot than almost any other environment — making it the ideal proving ground for Physical AI.
Picking, placing, stacking, scanning, bagging, mopping, organizing — 500+ distinct skills captured across customer, staff, and logistics operations.
Autonomous restocking and checkout are among the highest-demand use cases for humanoid robots, targeting the $22B machine vision in retail market.
If a VLA model can handle a cluttered grocery aisle with customers, carts, and staff in motion, it transfers to warehouses, fulfillment centers, and retail at large.
A curated 20–30 hour subset available in LeRobot format — try before you buy, benchmark against your existing training data.
Isaac Sim-native USD scenes, GR00T training pipeline integration, Isaac Lab compatibility, and Omniverse support
Open teaser dataset in LeRobot RLDS format — discoverable by the global research community
Full Open X-Embodiment compatibility — seamless integration with existing VLA training pipelines
Foundation model teams at NVIDIA, Figure AI, 1X Technologies, and Agility Robotics building VLA models that need diverse, high-quality 3D training data.
Embodiment research programs at Google DeepMind, Meta FAIR, and OpenAI pushing the boundaries of spatial intelligence and world models.
Large retailers and logistics operators — Amazon Robotics, Walmart, Ocado, DHL — investing in automation and needing environment-specific training data.
Managing Partner at SRI Capital. Founder & former CEO of AppLabs (acquired by CSC). PhD Wharton, MS NYU, BTech IIT Delhi.
BS & MS in computer vision from IIIT Hyderabad. His CVPR'16 paper on computational cameras became the seed for DreamVu.
Professor at IIIT Hyderabad. 75+ published papers. Built systems currently deployed at massive scale.
PhD in Computational Photography from IIT Hyderabad. Eight years focused on ML, high-performance computing for CV.