70+
Photorealistic Environments
910
Trajectories
1.5 M
Samples
3
Ground Robot Motions
We present TartanGround, a large-scale, multi-modal dataset to advance the perception and autonomy of ground robots operating in diverse environments. This dataset, collected in various photorealistic simulation environments includes multiple RGB stereo cameras for 360-degree coverage, along with depth, optical flow, stereo disparity, LiDAR point clouds, ground truth poses, semantic segmented images, and occupancy maps with semantic labels. Data is collected using an integrated automatic pipeline, which generates trajectories mimicking the motion patterns of various ground robot platforms, including wheeled and legged robots. We collect 910 trajectories across 70 environments, resulting in 1.5 million samples. Evaluations on occupancy prediction and SLAM tasks reveal that state-of-the-art methods trained on existing datasets struggle to generalize across diverse scenes. TartanGround can serve as a testbed for training and evaluation of a broad range of learning-based tasks, including occupancy prediction, SLAM, neural scene representation, perception-based navigation, and more, enabling advancements in robotic perception and autonomy towards achieving robust models generalizable to more diverse scenarios.
A trajectory from TartanGround (Winter Forest environment) includes multiple stereo RGB images covering a full 360° field-of-view, along with accurate depth and semantic annotations.It also provides ground truth poses, LiDAR, IMU data, and semantic occupancy maps for comprehensive scene understanding.
TartanGround provides diverse and synchronized multi-modal data streams designed to support advanced robotic perception and learning tasks.
Front, back, left, right, top, and bottom stereo pairs for full 360° scene coverage.
Pixel-level depth maps and semantic labels aligned with stereo imagery.
3D voxel grids with semantic labels for detailed spatial understanding.
Simulated LiDAR point clouds and inertial data for robust state estimation.
Dense motion and stereo disparity fields for temporal and depth supervision.
Accurate 6-DoF pose at each timestep for training and evaluation
Joint states, velocities, and contact forces for quadruped trajectories
Render new views with user-defined intrinsics and orientations—ideal for real robot alignment.
The TartanGround dataset features 74 photorealistic simulation environments carefully selected to cover a wide range of real-world conditions. These environments are categorized into six types: Indoor, Nature, Rural, Urban, Industrial/Infrastructure, and Historical/Thematic. This diversity supports robust generalization across varied terrain and lighting conditions.
Overview of the environments in the TartanGround dataset
The TartanGround dataset can be used for training and evaluation of various tasks such as Semantic Occupancy prediction, visual SLAM, Neural scene representation, Bird's-eye-view prediction, Navigation and more.
The TartanGround dataset is licensed under the Creative Commons Attribution 4.0 International License . The accompanying toolkit and codebase are released under the MIT License .
@ARTICLE{patel25tartanground,
author={Patel, Manthan and Yang, Fan and Qiu, Yuheng and Cadena, Cesar and Scherer, Sebastian and Hutter, Marco and Wang, Wenshan},
journal={Under review for IEEE conferene},
title={TartanGround: A Large-Scale Dataset for Ground Robot Perception and Navigation},
year={2025},}