특징 선택

특징 선택(Feature selection)은 모델 구성에 사용할 관련 특징(변수, 예측 변수)의 하위 집합을 선택하는 프로세스이다. 스타일로메트리와 DNA 마이크로어레이 분석은 특징 선택이 사용되는 두 가지 경우이다. 특징 추출과는 구별되어야 한다.^[1]

특징 선택 기술은 다음과 같은 여러 가지 이유로 사용된다.

연구자/사용자가 해석하기 쉽도록 모델을 단순화하기 위해^[2]
더 짧은 훈련 시간^[3]
차원의 저주를 피하기 위해^[4]
학습 모델 클래스와의 데이터 호환성을 향상시키기 위해^[5]
입력 공간에 존재하는 고유한 대칭성을 인코딩하기 위해^[6]^[7]^[8]^[9]

특징 선택 기술을 사용할 때의 핵심 전제는 데이터에 중복되거나 관련성이 없는 일부 특징이 포함되어 있으므로 정보 손실을 많이 발생시키지 않고 제거할 수 있다는 것이다.^[10] 중복성과 관련성 없음은 서로 다른 두 가지 개념이다. 하나의 관련 기능이 밀접하게 연관되어 있는 다른 관련 기능이 있으면 중복될 수 있기 때문이다.

특징 추출은 원래 특징의 함수로부터 새로운 특징을 생성하는 반면, 특징 선택은 특징의 하위 집합을 반환한다. 특징 선택 기술은 특징이 많고 샘플(또는 데이터 포인트)이 상대적으로 적은 도메인에서 자주 사용된다.

같이 보기

각주

↑ Sarangi, Susanta; Sahidullah, Md; Saha, Goutam (September 2020). “Optimization of data-driven filterbank for automatic speaker verification”. 《Digital Signal Processing》 104: 102795. arXiv:2007.10729. doi:10.1016/j.dsp.2020.102795. S2CID 220665533.
↑ Gareth James; Daniela Witten; Trevor Hastie; Robert Tibshirani (2013). 《An Introduction to Statistical Learning》. Springer. 204쪽. 2019년 6월 23일에 원본 문서에서 보존된 문서. 2024년 4월 14일에 확인함.
↑ Brank, Janez; Mladenić, Dunja; Grobelnik, Marko; Liu, Huan; Mladenić, Dunja; Flach, Peter A.; Garriga, Gemma C.; Toivonen, Hannu; Toivonen, Hannu (2011), 〈Feature Selection〉, Sammut, Claude; Webb, Geoffrey I., 《Encyclopedia of Machine Learning》 (영어), Boston, MA: Springer US, 402–406쪽, doi:10.1007/978-0-387-30164-8_306, ISBN 978-0-387-30768-8, 2021년 7월 13일에 확인함
↑ Kramer, Mark A. (1991). “Nonlinear principal component analysis using autoassociative neural networks”. 《AIChE Journal》 (영어) 37 (2): 233–243. doi:10.1002/aic.690370209. ISSN 1547-5905.
↑ Kratsios, Anastasis; Hyndman, Cody (2021). “NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation”. 《Journal of Machine Learning Research》 22 (92): 1–51. ISSN 1533-7928.
↑ Persello, Claudio; Bruzzone, Lorenzo (July 2014). 〈Relevant and invariant feature selection of hyperspectral images for domain generalization〉. 《2014 IEEE Geoscience and Remote Sensing Symposium》 (PDF). IEEE. 3562–3565쪽. doi:10.1109/igarss.2014.6947252. ISBN 978-1-4799-5775-0. S2CID 8368258.
↑ Hinkle, Jacob; Muralidharan, Prasanna; Fletcher, P. Thomas; Joshi, Sarang (2012). 〈Polynomial Regression on Riemannian Manifolds〉. Fitzgibbon, Andrew; Lazebnik, Svetlana; Perona, Pietro; Sato, Yoichi; Schmid, Cordelia. 《Computer Vision – ECCV 2012》. Lecture Notes in Computer Science (영어) 7574. Berlin, Heidelberg: Springer. 1–14쪽. arXiv:1201.2395. doi:10.1007/978-3-642-33712-3_1. ISBN 978-3-642-33712-3. S2CID 8849753.
↑ Yarotsky, Dmitry (2021년 4월 30일). “Universal Approximations of Invariant Maps by Neural Networks”. 《Constructive Approximation》 (영어) 55: 407–474. arXiv:1804.10306. doi:10.1007/s00365-021-09546-1. ISSN 1432-0940. S2CID 13745401.
↑ Hauberg, Søren; Lauze, François; Pedersen, Kim Steenstrup (2013년 5월 1일). “Unscented Kalman Filtering on Riemannian Manifolds”. 《Journal of Mathematical Imaging and Vision》 (영어) 46 (1): 103–120. doi:10.1007/s10851-012-0372-9. ISSN 1573-7683. S2CID 8501814.
↑ Kratsios, Anastasis; Hyndman, Cody (2021년 6월 8일). “NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation”. 《Journal of Machine Learning Research》 22: 10312. Bibcode:2015NatSR...510312B. doi:10.1038/srep10312. PMC 4437376. PMID 25988841.

외부 링크

Feature Selection Package, Arizona State University (Matlab Code)
NIPS challenge 2003 (see also NIPS)
Naive Bayes implementation with feature selection in Visual Basic 보관됨 2009-02-14 - 웨이백 머신 (includes executable and source code)
Minimum-redundancy-maximum-relevance (mRMR) feature selection program
FEAST (Open source Feature Selection algorithms in C and MATLAB)

[1] Sarangi, Susanta; Sahidullah, Md; Saha, Goutam (September 2020). “Optimization of data-driven filterbank for automatic speaker verification”. 《Digital Signal Processing》 104: 102795. arXiv:2007.10729. doi:10.1016/j.dsp.2020.102795. S2CID 220665533.

[islr-2] Gareth James; Daniela Witten; Trevor Hastie; Robert Tibshirani (2013). 《An Introduction to Statistical Learning》. Springer. 204쪽. 2019년 6월 23일에 원본 문서에서 보존된 문서. 2024년 4월 14일에 확인함.

[3] Brank, Janez; Mladenić, Dunja; Grobelnik, Marko; Liu, Huan; Mladenić, Dunja; Flach, Peter A.; Garriga, Gemma C.; Toivonen, Hannu; Toivonen, Hannu (2011), 〈Feature Selection〉, Sammut, Claude; Webb, Geoffrey I., 《Encyclopedia of Machine Learning》 (영어), Boston, MA: Springer US, 402–406쪽, doi:10.1007/978-0-387-30164-8_306, ISBN 978-0-387-30768-8, 2021년 7월 13일에 확인함

[4] Kramer, Mark A. (1991). “Nonlinear principal component analysis using autoassociative neural networks”. 《AIChE Journal》 (영어) 37 (2): 233–243. doi:10.1002/aic.690370209. ISSN 1547-5905.

[5] Kratsios, Anastasis; Hyndman, Cody (2021). “NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation”. 《Journal of Machine Learning Research》 22 (92): 1–51. ISSN 1533-7928.

[6] Persello, Claudio; Bruzzone, Lorenzo (July 2014). 〈Relevant and invariant feature selection of hyperspectral images for domain generalization〉. 《2014 IEEE Geoscience and Remote Sensing Symposium》 (PDF). IEEE. 3562–3565쪽. doi:10.1109/igarss.2014.6947252. ISBN 978-1-4799-5775-0. S2CID 8368258.

[7] Hinkle, Jacob; Muralidharan, Prasanna; Fletcher, P. Thomas; Joshi, Sarang (2012). 〈Polynomial Regression on Riemannian Manifolds〉. Fitzgibbon, Andrew; Lazebnik, Svetlana; Perona, Pietro; Sato, Yoichi; Schmid, Cordelia. 《Computer Vision – ECCV 2012》. Lecture Notes in Computer Science (영어) 7574. Berlin, Heidelberg: Springer. 1–14쪽. arXiv:1201.2395. doi:10.1007/978-3-642-33712-3_1. ISBN 978-3-642-33712-3. S2CID 8849753.

[8] Yarotsky, Dmitry (2021년 4월 30일). “Universal Approximations of Invariant Maps by Neural Networks”. 《Constructive Approximation》 (영어) 55: 407–474. arXiv:1804.10306. doi:10.1007/s00365-021-09546-1. ISSN 1432-0940. S2CID 13745401.

[9] Hauberg, Søren; Lauze, François; Pedersen, Kim Steenstrup (2013년 5월 1일). “Unscented Kalman Filtering on Riemannian Manifolds”. 《Journal of Mathematical Imaging and Vision》 (영어) 46 (1): 103–120. doi:10.1007/s10851-012-0372-9. ISSN 1573-7683. S2CID 8501814.

[Bermingham-prolog-10] Kratsios, Anastasis; Hyndman, Cody (2021년 6월 8일). “NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation”. 《Journal of Machine Learning Research》 22: 10312. Bibcode:2015NatSR...510312B. doi:10.1038/srep10312. PMC 4437376. PMID 25988841.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]