지식 증류

지식 증류(知識蒸溜, knowledge distillation)는 지식을 증류한다는 뜻으로 서로 다른 기계학습 모델의 지식을 전하는 기법이다.

기계 학습에서 지식 증류 또는 모델 증류는 큰 모델에서 작은 모델로 지식을 전달하는 프로세스이다. 대규모 모델(예: 매우 심층적인 신경망 또는 여러 모델의 앙상블)은 소규모 모델보다 지식 용량이 높지만 이 용량이 완전히 활용되지 않을 수 있다. 모델이 지식 용량을 거의 활용하지 않더라도 모델을 평가하는 데 계산 비용이 많이 들 수 있다. 지식 증류는 유효성 손실 없이 큰 모델의 지식을 작은 모델로 이전한다. 소형 모델은 평가 비용이 저렴하므로 덜 강력한 하드웨어(예: 모바일 장치)에 배포할 수 있다.^[1]

지식 증류는 객체 감지,^[2] 음향 모델^[3] 및 자연어 처리와 같은 기계 학습의 여러 응용 분야에서 성공적으로 사용되었다.^[4] 최근에는 그리드가 아닌 데이터에 적용 가능한 그래프 신경망에도 도입되었다.^[5]

같이 보기

각주

↑ Hinton, Geoffrey; Vinyals, Oriol; Dean, Jeff (2015). “Distilling the knowledge in a neural network”. arXiv:1503.02531 [stat.ML].
↑ Chen, Guobin; Choi, Wongun; Yu, Xiang; Han, Tony; Chandraker, Manmohan (2017). “Learning efficient object detection models with knowledge distillation”. 《Advances in Neural Information Processing Systems》: 742–751.
↑ Asami, Taichi; Masumura, Ryo; Yamaguchi, Yoshikazu; Masataki, Hirokazu; Aono, Yushi (2017). 《Domain adaptation of DNN acoustic models using knowledge distillation》. IEEE International Conference on Acoustics, Speech and Signal Processing. 5185–5189쪽.
↑ Cui, Jia; Kingsbury, Brian; Ramabhadran, Bhuvana; Saon, George; Sercu, Tom; Audhkhasi, Kartik; Sethy, Abhinav; Nussbaum-Thom, Markus; Rosenberg, Andrew (2017). 《Knowledge distillation across ensembles of multilingual models for low-resource languages》. IEEE International Conference on Acoustics, Speech and Signal Processing. 4825–4829쪽.
↑ Yang, Yiding; Jiayan, Qiu; Mingli, Song; Dacheng, Tao; Xinchao, Wang (2020). “Distilling Knowledge from Graph Convolutional Networks” (PDF). 《Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition》: 7072–7081. arXiv:2003.10477. Bibcode:2020arXiv200310477Y.

[Hinton15-1] Hinton, Geoffrey; Vinyals, Oriol; Dean, Jeff (2015). “Distilling the knowledge in a neural network”. arXiv:1503.02531 [stat.ML].

[2] Chen, Guobin; Choi, Wongun; Yu, Xiang; Han, Tony; Chandraker, Manmohan (2017). “Learning efficient object detection models with knowledge distillation”. 《Advances in Neural Information Processing Systems》: 742–751.

[3] Asami, Taichi; Masumura, Ryo; Yamaguchi, Yoshikazu; Masataki, Hirokazu; Aono, Yushi (2017). 《Domain adaptation of DNN acoustic models using knowledge distillation》. IEEE International Conference on Acoustics, Speech and Signal Processing. 5185–5189쪽.

[4] Cui, Jia; Kingsbury, Brian; Ramabhadran, Bhuvana; Saon, George; Sercu, Tom; Audhkhasi, Kartik; Sethy, Abhinav; Nussbaum-Thom, Markus; Rosenberg, Andrew (2017). 《Knowledge distillation across ensembles of multilingual models for low-resource languages》. IEEE International Conference on Acoustics, Speech and Signal Processing. 4825–4829쪽.

[5] Yang, Yiding; Jiayan, Qiu; Mingli, Song; Dacheng, Tao; Xinchao, Wang (2020). “Distilling Knowledge from Graph Convolutional Networks” (PDF). 《Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition》: 7072–7081. arXiv:2003.10477. Bibcode:2020arXiv200310477Y.

[1]

[2]

[3]

[4]

[5]