주의집중 (기계 학습)

주의집중(注意集中) 또는 어텐션(attention)은 인간의 주의집중을 모방하여 중요한 입력 부분을 다시 참고하는 기계 학습 기법이다. 변환기(트랜스포머)에 사용된다. 점곱 주의집중(dot-product attention)과 멀티헤드 주의집중(multi-head attention) 기법이 많이 사용된다.

이 효과는 입력 데이터의 일부를 향상시키면서 다른 부분을 감소시킨다. 네트워크가 이미지나 문장의 작은 부분일지라도 데이터의 중요한 부분에 더 집중해야 한다는 동기가 부여된다. 데이터의 어느 부분이 다른 부분보다 더 중요한지 학습하는 것은 상황에 따라 다르며 경사 하강법으로 학습된다.

주의집중 유사 메커니즘은 곱셈 모듈, 시그마 파이 단위, 하이퍼 네트워크와 같은 이름으로 1990년대에 도입되었다.^[1] 이 메커니즘의 유연성은 런타임에 고정된 상태로 유지되어야 하는 표준 가중치와 달리 런타임 중에 변경될 수 있는 "소프트 가중치"로서의 역할에서 비롯된다. 주의집중의 용도에는 "내부 어텐션 스포트라이트"("선형화된 셀프 어텐션"이 있는 변환기라고도 함)를 학습할 수 있는 빠른 가중치 컨트롤러^[2]의 메모리, 신경 튜링 기계, 차별화 가능한 신경 컴퓨터의 추론 작업, 변환기의 언어 처리 및 LSTM이 포함된다.^[3]^[4]^[5]^[6]

각주 편집

↑ Yann Lecun (2020). 《Deep Learning course at NYU, Spring 2020, video lecture Week 6》. 53:00에 발생. 2022년 3월 8일에 확인함.
↑ Schmidhuber, Jürgen (1992). “Learning to control fast-weight memories: an alternative to recurrent nets.”. 《Neural Computation》 4 (1): 131–139.
↑ Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, Illia (2017년 12월 5일). “Attention Is All You Need”. arXiv:1706.03762 [cs.CL].
↑ Ramachandran, Prajit; Parmar, Niki; Vaswani, Ashish; Bello, Irwan; Levskaya, Anselm; Shlens, Jonathon (2019년 6월 13일). “Stand-Alone Self-Attention in Vision Models”. arXiv:1906.05909 [cs.CV].
↑ Jaegle, Andrew; Gimeno, Felix; Brock, Andrew; Zisserman, Andrew; Vinyals, Oriol; Carreira, Joao (2021년 6월 22일). “Perceiver: General Perception with Iterative Attention”. arXiv:2103.03206 [cs.CV].
↑ Ray, Tiernan. “Google's Supermodel: DeepMind Perceiver is a step on the road to an AI machine that could process anything and everything”. 《ZDNet》 (영어). 2021년 8월 19일에 확인함.

외부 링크 편집

Dan Jurafsky and James H. Martin (2022) Speech and Language Processing (3rd ed. draft, January 2022), ch. 10.4 Attention and ch. 9.7 Self-Attention Networks: Transformers
Alex Graves (4 May 2020), Attention and Memory in Deep Learning (video lecture), DeepMind / UCL, via YouTube
Rasa Algorithm Whiteboard - Attention via YouTube

이 글은 컴퓨터 과학에 관한 토막글입니다. 여러분의 지식으로 알차게 문서를 완성해 갑시다.

[Lecun2020-1] Yann Lecun (2020). 《Deep Learning course at NYU, Spring 2020, video lecture Week 6》. 53:00에 발생. 2022년 3월 8일에 확인함.

[transform1992-2] Schmidhuber, Jürgen (1992). “Learning to control fast-weight memories: an alternative to recurrent nets.”. 《Neural Computation》 4 (1): 131–139.

[allyouneed-3] Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, Illia (2017년 12월 5일). “Attention Is All You Need”. arXiv:1706.03762 [cs.CL].

[Ramachandran2019-4] Ramachandran, Prajit; Parmar, Niki; Vaswani, Ashish; Bello, Irwan; Levskaya, Anselm; Shlens, Jonathon (2019년 6월 13일). “Stand-Alone Self-Attention in Vision Models”. arXiv:1906.05909 [cs.CV].

[jaegle2021-5] Jaegle, Andrew; Gimeno, Felix; Brock, Andrew; Zisserman, Andrew; Vinyals, Oriol; Carreira, Joao (2021년 6월 22일). “Perceiver: General Perception with Iterative Attention”. arXiv:2103.03206 [cs.CV].

[tiernan2021-6] Ray, Tiernan. “Google's Supermodel: DeepMind Perceiver is a step on the road to an AI machine that could process anything and everything”. 《ZDNet》 (영어). 2021년 8월 19일에 확인함.

[1]

[2]

[3]

[4]

[5]

[6]