The universality and robustness of existing place recognition methods are significantly affected by differences in the characteristics of multi-modal LiDARs. To address this issue, this paper proposes a universal place recognition method designed for multi-modal LiDARs. The method is based on the construction of curved-surface and triangle descriptors with curved voxels that is adaptable to the LiDAR beam. In the global retrieval stage, we introduce a FOV Alignment method to handle the matching of LiDARs with significant differences in FOV or viewpoint. For the fine verification, places are effectively associated by combining the local occupancy information of vertexes and the spatial topological relationship of triangles. Additionally, a triangle pose refinement algorithm is proposed to estimate the precise 6-DOF pose. Extensive experiments are conducted on four public datasets with both single-LiDAR and cross-LiDAR. The experimental results demonstrate that the proposed method outperforms existing methods in terms of universality and robustness. The code is released as open source upon acceptance.
The system framework of CVAT is illustrated in Figure. After pre-processing the input point cloud, curved-surface features and triangle descriptors are generated based on the curved voxel map. In the place recognition stage, candidate places are selected through FOV Alignment for coarse retrieval, followed by fine verification using triangle descriptors. Finally, place similarity and 6-DOF pose estimation are achieved.
The experimental results are summarized in Table I. Our method achieves superior performance on the KITTI and KITTI-360 datasets, maintaining stability even when handling reverse loops and large-scale scenes. Due to challenges such as occlusions, narrow FOV, and nonrepetitive scanning, most methods on the HeLiPR and MCD datasets exhibit decreased accuracy or even fail. Nevertheless, our approach, incorporating the FOV Alignment strategy and the coarse-to-fine verification approach, consistently demonstrates satisfactory performance.
For 6-DoF pose estimation, our algorithm outperforms the compared methods in most sequences. Even in the Riverside04 sequence, where crossing the bridge introduces numerous invalid LiDAR points due to water surface reflections, multi-stage pose optimization ensures robust performance of the proposed method. Figure illustrates the point cloud registration results using the 6-DoF poses estimated by our algorithm under conditions of 180° viewpoint differences and narrow FOV. By combining RANSAC with Triangle Pose Refinement, the algorithm achieves minimal registration ghosting and high-quality alignment.
The experimental results are presented in Fig. 6. In most cross-LiDAR matching scenarios, single-LiDAR place recognition methods fail to produce reliable results. The descriptors of SC and SOLiD are unable to handle cases with large FOV differences between LiDARs; M2DP loses specificity when point cloud density varies significantly; STD cannot construct stable triangle descriptors when the scanning mode changes. DVMM demonstrates superior performance when matching Velodyne and Ouster, but its accuracy drops considerably in other scenarios, likely due to the challenges posed by large-scale and highly complex urban environments.
In contrast, the proposed method, with its meticulously designed triangle descriptors based on curved voxels, effectively captures the structural characteristics of the environment and enhances the generalization ability of cross-LiDAR place recognition, achieving superior performance. Figure illustrates the place recognition results using Ouster 128 as the reference. The results indicate that CVAT maintains stable performance even when the LiDAR FOV narrows or when point cloud density and quantity decrease.
For a place recognition system, the ability to reliably detect long-distance loop closures while maintaining overall system performance is a key indicator of robustness. In this experiment, the Velodyne VLP16 and Livox Avia LiDARs from the KAIST05 sequence are selected to evaluate this capability by varying the recognition threshold for positive samples. Three thresholds (3 m, 10 m, and 15 m) are tested. The results are presented in Fig. 8. As the threshold distance increases, the performance of SOLiD and STD declines rapidly, whereas the precision-recall curve of the proposed method contracts more slowly, demonstrating strong robustness.