연구 소개

  • 연구
  • 연구 소개

Stereo Confidence Estimation via Locally Adaptive Fusion and Knowledge Distillation

  • AI융합대학
  • 2022-09-06

김선옥 교수의 연구실에서 발표한 논문 "Stereo Confidence Estimation via Locally Adaptive Fusion and Knowledge Distillation"이 "IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"에 게재 승인되었다. IEEE TPAMI는 세계 최고 권위의 인공지능 학술지 중 하나로, Impact factor가 24.3점에 달한다. 본 연구는 김선옥 교수가 제 1 저자로 주도했으며, 스위스의 로잔 연방 공과대학교(EPFL), 연세대학교, 고려대학교, 이화여자대학교가 공동으로 참여하였다. 


 


[Figure 1] Network configuration in knowledge distilation framework


 

Stereo confidence estimation aims to estimate the reliability of the estimated disparity by stereo matching. Different from the previous methods that exploit the limited input modality, we present a novel method that estimates confidence map of an initial disparity by making full use of tri-modal input, including matching cost, disparity, and color image through deep networks. The proposed network, termed as Locally Adaptive Fusion Networks (LAF-Net), learns locally-varying attention and scale maps to fuse the tri-modal confidence features. Moreover, we propose a knowledge distillation framework to learn more compact confidence estimation networks as student networks. By transferring the knowledge from LAF-Net as teacher networks, the student networks that solely take as input a disparity can achieve comparable performance. To transfer more informative knowledge, we also propose a module to learn the locally-varying temperature in a softmax function. We further extend this framework to a multiview scenario. Experimental results show that LAF-Net and its variations outperform the state-of-the-art stereo confidence methods on various benchmarks.

 

  

 

[Figure 2Confidence maps on KITTI 2015 dataset

 

 

We presented LAF-Net that estimates confidence with tri-modal input, including matching cost, disparity, and color image through deep networks. The key idea of the proposed method is to design locally adaptive attention and scale inference networks to generate optimal fusion weights. In addition, the confidence estimation performance is further improved with recursive refinement networks. In addition, we presented an effective confidence estimator through knowledge distillation using the LAF-Net taking tri-modal input as teacher, where we learned the locally varying temperature which is effective in transferring more informative value to student networks. The proposed method has competitive accuracy with simpler networks than teacher. We further applied the proposed framework to multiview stereo confidence estimation which demonstrates the generalization ability of the proposed framework. A direction for further study is to examine how confidence estimation networks could be learned in an unsupervised manner.