LEAP-VO: Long-term Effective Any Point Tracking
for Visual Odometry

CVPR 2024

1TU Munich    2Munich Center for Machine Learning    3MPI for Intelligent Systems    4Microsoft

LEAP-VO is a robust visual odometry system that leverages temporal context with long-term point tracking to achieve motion estimation, occlusion handling, and track probability modeling.

Abstract

Visual odometry estimates the motion of a moving camera based on visual input. Existing methods, mostly focusing on two-view point tracking, often ignore the rich temporal context in the image sequence, thereby overlooking the global motion patterns and providing no assessment of the full trajectory reliability. These shortcomings hinder performance in scenarios with occlusion, dynamic objects, and low-texture areas. To address these challenges, we present the Long-term Effective Any Point Tracking (LEAP) module. LEAP innovatively combines visual, inter-track, and temporal cues with mindfully selected anchors for dynamic track estimation. Moreover, LEAP's temporal probabilistic formulation integrates distribution updates into a learnable iterative refinement module to reason about point-wise uncertainty. Based on these traits, we develop LEAP-VO, a robust visual odometry system adept at handling occlusions and dynamic scenes. Our mindful integration showcases a novel practice by employing long-term point tracking as the front-end. Extensive experiments demonstrate that the proposed pipeline significantly outperforms existing baselines across various visual odometry benchmarks.

Method

LEAP Front-end: After extracting image feature maps, selected anchors assist in tracking. The queries and anchors are processed by a refiner to iteratively update states, aggregating channel, inter-track, and temporal information. The LEAP tracker outputs trajectory distribution, visibility, and dynamic track labels.

teaser-fig.

LEAP-VO: Given a new image, the feature extractor extracts new keypoints from the incoming image. Then, all the keypoints are tracked across all other frames within the current LEAP window, followed by a track filtering step to remove outliers. Finally, the local BA module is used on the current BA window to update the camera poses and 3D positions of the extracted keypoints.

teaser-fig.

Qualitative Results

Qualitative results for Visual Odometry on MPI-Sintel. Upper left: image sample with static (green) point tracking. Lower left: image sample with dynamic (red) and uncertain (yellow) point tracking. Right: comparison with the state-of-the-art VO methods.

Dynamic Track Estimation

Visualization of dynamic track estimation on DAVIS, MPI-Sintel, and TartanAir-Shibuya. Odd columns: all point trajectories. Even columns: estimated dynamic point trajectories.

teaser-fig.

BibTeX

@InProceedings{chen2024leap,
      title={LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry},
      author={Chen, Weirong and Chen, Le and Wang, Rui and Pollefeys, Marc},
      journal={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year={2024}
}