TartanGround: A Large-Scale Dataset for Ground Robot Perception and Navigation

A Large-Scale Dataset for Ground Robot Perception and Navigation

Robotic Systems Lab, ETH Zurich | Airlab, Carnegie Mellon Unviersity
IEEE/RSJ IROS 2025

Abstract

We present TartanGround, a large-scale, multi-modal dataset to advance the perception and autonomy of ground robots operating in diverse environments. This dataset, collected in various photorealistic simulation environments includes multiple RGB stereo cameras for 360-degree coverage, along with depth, optical flow, stereo disparity, LiDAR point clouds, ground truth poses, semantic segmented images, and occupancy maps with semantic labels. Data is collected using an integrated automatic pipeline, which generates trajectories mimicking the motion patterns of various ground robot platforms, including wheeled and legged robots. We collect 878 trajectories across 63 environments, resulting in 1.44 million samples. Evaluations on occupancy prediction and SLAM tasks reveal that state-of-the-art methods trained on existing datasets struggle to generalize across diverse scenes. TartanGround can serve as a testbed for training and evaluation of a broad range of learning-based tasks, including occupancy prediction, SLAM, neural scene representation, perception-based navigation, and more, enabling advancements in robotic perception and autonomy towards achieving robust models generalizable to more diverse scenarios.

The Dataset

TartanGround provides diverse and synchronized multi-modal data streams designed to support advanced robotic perception and learning tasks.

6 RGB Stereo Camera Pairs

Front, back, left, right, top, and bottom stereo pairs for full 360° scene coverage.

Depth & Semantic Segmentation

Pixel-level depth maps and semantic labels aligned with stereo imagery.

Semantic Occupancy Maps

3D voxel grids with semantic labels for detailed spatial understanding.

LiDAR & IMU

Simulated LiDAR point clouds and inertial data for robust state estimation.

Optical Flow & Disparity

Dense motion and stereo disparity fields for temporal and depth supervision.

Ground Truth Poses

Accurate 6-DoF pose at each timestep for training and evaluation

Proprioceptive Data

Joint states, velocities, and contact forces for quadruped trajectories

Camera Resampling

Render new views with user-defined intrinsics and orientations—ideal for real robot alignment.

Environments

The TartanGround dataset features 63 photorealistic simulation environments carefully selected to cover a wide range of real-world conditions. These environments are categorized into six types: Indoor, Nature, Rural, Urban, Industrial/Infrastructure, and Historical/Thematic. This diversity supports robust generalization across varied terrain and lighting conditions.

Overview of the environments in the TartanGround dataset

Applications

The TartanGround dataset can be used for training and evaluation of various tasks such as Semantic Occupancy prediction, Open-Vocabulary Occupancy Prediction, visual SLAM, Neural scene representation, Bird's-eye-view prediction, Navigation and more.

BibTeX

@article{patel2025tartanground, title={TartanGround: A Large-Scale Dataset for Ground Robot Perception and Navigation}, author={Patel, Manthan and Yang, Fan and Qiu, Yuheng and Cadena, Cesar and Scherer, Sebastian and Hutter, Marco and Wang, Wenshan}, journal={arXiv preprint arXiv:2505.10696}, year={2025}}

TartanGround

A Large-Scale Dataset for Ground Robot Perception and Navigation

Abstract

The Dataset

6 RGB Stereo Camera Pairs

Depth & Semantic Segmentation

Semantic Occupancy Maps

LiDAR & IMU

Optical Flow & Disparity

Ground Truth Poses

Proprioceptive Data

Camera Resampling

Environments

Applications

3D semantic occupancy prediction from multi-view RGB inputs

Novel view synthesis using Gaussian Splatting on the Rome environment

Performance of Mac-VO in forest environment with tall grass and heavy occlusions

RayFronts (open-set semantic occupancy mapping) uses the TartanGround Dataset for evaluations

Novel view synthesis using Gaussian Splatting on the Coalmine environment

Performance of Mac-VO in structured, cluttered urban environment

License

BibTeX