TartanGround

A Large-Scale Dataset for Ground Robot Perception and Navigation

Robotic Systems Lab, ETH Zurich | Airlab, Carnegie Mellon Unviersity
Under Review for IEEE Conference

70+

Photorealistic Environments

910

Trajectories

1.5 M

Samples

3

Ground Robot Motions

Abstract

We present TartanGround, a large-scale, multi-modal dataset to advance the perception and autonomy of ground robots operating in diverse environments. This dataset, collected in various photorealistic simulation environments includes multiple RGB stereo cameras for 360-degree coverage, along with depth, optical flow, stereo disparity, LiDAR point clouds, ground truth poses, semantic segmented images, and occupancy maps with semantic labels. Data is collected using an integrated automatic pipeline, which generates trajectories mimicking the motion patterns of various ground robot platforms, including wheeled and legged robots. We collect 910 trajectories across 70 environments, resulting in 1.5 million samples. Evaluations on occupancy prediction and SLAM tasks reveal that state-of-the-art methods trained on existing datasets struggle to generalize across diverse scenes. TartanGround can serve as a testbed for training and evaluation of a broad range of learning-based tasks, including occupancy prediction, SLAM, neural scene representation, perception-based navigation, and more, enabling advancements in robotic perception and autonomy towards achieving robust models generalizable to more diverse scenarios.

TartanGround Overview

A trajectory from TartanGround (Winter Forest environment) includes multiple stereo RGB images covering a full 360° field-of-view, along with accurate depth and semantic annotations.It also provides ground truth poses, LiDAR, IMU data, and semantic occupancy maps for comprehensive scene understanding.

The Dataset

TartanGround provides diverse and synchronized multi-modal data streams designed to support advanced robotic perception and learning tasks.

6 RGB Stereo Camera Pairs

Front, back, left, right, top, and bottom stereo pairs for full 360° scene coverage.

Depth & Semantic Segmentation

Pixel-level depth maps and semantic labels aligned with stereo imagery.

Semantic Occupancy Maps

3D voxel grids with semantic labels for detailed spatial understanding.

LiDAR & IMU

Simulated LiDAR point clouds and inertial data for robust state estimation.

Optical Flow & Disparity

Dense motion and stereo disparity fields for temporal and depth supervision.

Ground Truth Poses

Accurate 6-DoF pose at each timestep for training and evaluation

Proprioceptive Data

Joint states, velocities, and contact forces for quadruped trajectories

Camera Resampling

Render new views with user-defined intrinsics and orientations—ideal for real robot alignment.

Environments

The TartanGround dataset features 74 photorealistic simulation environments carefully selected to cover a wide range of real-world conditions. These environments are categorized into six types: Indoor, Nature, Rural, Urban, Industrial/Infrastructure, and Historical/Thematic. This diversity supports robust generalization across varied terrain and lighting conditions.

Environment Categories

Overview of the environments in the TartanGround dataset

Applications

The TartanGround dataset can be used for training and evaluation of various tasks such as Semantic Occupancy prediction, visual SLAM, Neural scene representation, Bird's-eye-view prediction, Navigation and more.

License

The TartanGround dataset is licensed under the Creative Commons Attribution 4.0 International License . The accompanying toolkit and codebase are released under the MIT License .

BibTeX

@ARTICLE{patel25tartanground,
      author={Patel, Manthan and Yang, Fan and Qiu, Yuheng and Cadena, Cesar and Scherer, Sebastian and Hutter, Marco and Wang, Wenshan},
      journal={Under review for IEEE conferene}, 
      title={TartanGround: A Large-Scale Dataset for Ground Robot Perception and Navigation},
      year={2025},}