LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry

1ETH Zürich, 2Max Planck Institute for Intelligent Systems, 3Microsoft

LEAP-VO leverages long-term point tracking to build a robust visual odometry system that excels in managing occlusions and dynamic environments.

Abstract

Visual odometry estimates the motion of a moving camera based on visual input. Existing methods, mostly focusing on two-view point tracking, often ignore the rich temporal context in the image sequence, thereby overlooking the global motion patterns and providing no assessment of the full trajectory reliability. These shortcomings hinder performance in scenarios with occlusion, dynamic objects, and low-texture areas. To address these challenges, we present the Long-term Effective Any Point Tracking (LEAP) module. LEAP innovatively combines visual, inter-track, and temporal cues with mindfully selected anchors for dynamic track estimation. Moreover, LEAP's temporal probabilistic formulation integrates distribution updates into a learnable iterative refinement module to reason about point-wise uncertainty. Based on these traits, we develop LEAP-VO, a robust visual odometry system adept at handling occlusions and dynamic scenes. Our mindful integration showcases a novel practice by employing long-term point tracking as the front-end. Extensive experiments demonstrate that the proposed pipeline significantly outperforms existing baselines across various visual odometry benchmarks.

Method

LEAP Front-end: After extracting image feature maps, selected anchors assist in tracking. The queries and anchors are processed by a refiner to iteratively update states, aggregating channel, inter-track, and temporal information. The LEAP tracker outputs trajectory distribution, visibility, and dynamic track labels.

teaser-fig.

LEAP-VO: Given a new image, the feature extractor extracts new keypoints from the incoming image. Then, all the keypoints are tracked across all other frames within the current LEAP window, followed by a track filtering step to remove outliers. Finally, the local BA module is used on the current BA window to update the camera poses and 3D positions of the extracted keypoints.

teaser-fig.

Qualitative Results

Qualitative results for Visual Odometry on MPI-Sintel. Upper left: image sample with static (green) point tracking. Lower left: image sample with dynamic (red) and uncertain (yellow) point tracking. Right: comparison with the state-of-the-art VO methods.

Dynamic Track Estimation

Visualization of dynamic track estimation on DAVIS, MPI-Sintel, and TartanAir-Shibuya. Odd columns: all point trajectories. Even columns: estimated dynamic point trajectories.

teaser-fig.

BibTeX

@article{chen2024leap,
          title={LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry},
          author={Chen, Weirong and Chen, Le and Wang, Rui and Pollefeys, Marc},
          journal={arXiv preprint arXiv:2401.01887},
          year={2024}
}