아파치 루씬

아파치 루씬
개발자	아파치 소프트웨어 재단
발표일	1999년(25년 전)
안정화 버전	9.5.0 / 2023년 1월 30일(17개월 전)
저장소	github.com/apache/lucene ;
프로그래밍 언어	자바
운영 체제	크로스 플랫폼
종류	정보 검색 라이브러리
라이선스	아파치 라이선스 2.0
웹사이트	http://lucene.apache.org/

아파치 루씬(Apache Lucene)은 자바 언어로 이루어진 정보 검색 라이브러리 자유-오픈 소스 소프트웨어이며 더그 커팅에 의해 개발되었다. 아파치 소프트웨어 재단에 의해 지원되며 아파치 라이선스 하에 배포된다.

추가적인 개발을 통해 기존의 자바 언어 이외에 오브젝트 파스칼, 펄, C 샤프, C++, 파이썬, 루비, PHP 등 다른 프로그래밍 언어를 사용할 수 있도록 변경되었다.^[2]

역사

더그 커팅은 아파치 루씬을 1999년에 개발하였다.^[3] 처음에는 SourceForge 웹 사이트에서 다운로드 할 수 있었다. 2001년 9월에 아파치 소프트웨어 재단의 자카르타 프로젝트에 합류하여 2005년 2월에 자체 최상위 아파치 프로젝트가 되었다. 루씬이라는 이름은 더그 커팅의 아내의 중간 이름과 그의 할머니의 이름이다.^[4]

아파치 루씬은 이전에 Lucene.NET, 아파치 머하웃, 티카 및 너치와 같은 여러 하위 프로젝트를 포함시켰다. 이 프로젝트들은 현재 최상위 프로젝트로 독립한 상태이다.

2010년 3월 아파치 솔라 검색 서버는 루씬의 하위 프로젝트로 합류하고 개발자 커뮤니티가 통합되었다.

주요 기능

전문 검색(Full text) 색인 및 검색 기능을 필요로 하는 모든 응용 프로그램에 적합하지만 루씬은 웹 검색 엔진 및 로컬 단일 사이트 검색 구현에서의 유용성으로 널리 알려져 있다.^[5]^[6]

루씬은 편집 거리를 기반으로 퍼지 검색을 수행하는 기능을 포함한다.^[7]

루씬은 또한 추천 시스템을 구현하는데 사용되고 있다.^[8] 예를 들어, 루씬의 'MoreLikeThis' 클래스는 유사한 문서에 대한 추천을 생성할 수 있다.

루씬 논리 아키텍처의 핵심은 텍스트를 가지고 있는 필드를 포함하는 문서의 개념이다. 이러한 유연성이 루씬의 API가 파일 형식과 독립적으로 만들어주었다. PDF, HTML, 마이크로소프트 워드, 마인드맵 및 오픈도큐먼트 뿐만 아니라 많은 다른 정보(이미지 제외)의 텍스트 정보는 추출 할 수 있는 한 모두 색인을 생성할 수 있다.^[9]

루씬 기반 프로젝트들

루씬 그 자체는 색인 및 검색을 제공하는 라이브러리이며, 웹 크롤러나 HTML 구문 분석 등의 기능은 포함하지 않는다. 하지만 다양한 프로젝트가 루씬의 기능을 확장한다.

아파치 너치 - 웹 크롤러 및 HTML 구문 분석 제공
아파치 솔라 - 엔터프라이즈 검색 서버
Compass - 엘라스틱서치의 전신^[10]
CrateDB - 오픈소스, 루씬을 기반으로 하는 분산 SQL 데이터베이스^[11]
DocFetcher - 크로스 플랫폼 데스크톱 환경 검색 애플리케이션
엘라스틱서치 - 2010년에 만들어진 엔터프라이즈 서버^[12]
Kinosearch - 약간의 루씬 포팅^[13]과 함께 펄과 C^[14]로 작성한 검색 엔진. Socialtext의 위키^[14], 모조모조 위키 엔진^[15], Human Metabolome Database(HMDB)^[16]와 Toxin and Toxin-Target Database(T3DB)^[17]에서 사용한다.
Swiftype - 루씬 기반의 엔터프라이즈 서버 스타트업

루씬 사용자들

확장기능이 포함되지 않은 루씬 사용자들은 루씬의 "Powered By"^[18] 페이지 참조. 예를 들어 트위터는 실시간 검색을 위해서 루씬을 사용하고 있다.^[19]

같이 보기

각주

↑ “Welcome to Apache Lucene”. Lucene™ News section. 2020년 2월 12일에 원본 문서에서 보존된 문서. 2020년 2월 12일에 확인함.
↑ “LuceneImplementations”. apache.org. 2015년 9월 23일에 확인함.
↑ KeywordAnalyzer “Better Search with Apache Lucene and Solr” (PDF). 19 November 2007. 31 January 2012에 원본 문서 (PDF)에서 보존된 문서.
↑ Barker, Deane (2016). 《Web Content Management》. O'Reilly. 233쪽. ISBN 1491908106.
↑ McCandless, Michael; Hatcher, Erik; Gospodnetić, Otis (2010). 《Lucene in Action, Second Edition》. Manning. 8쪽. ISBN 1933988177.
↑ “GNU/Linux Semantic Storage System” (PDF). 《glscube.org》. 2010년 6월 1일에 원본 문서 (PDF)에서 보존된 문서.
↑ “Apache Lucene - Query Parser Syntax”. 《lucene.apache.org》. 2017년 5월 2일에 원본 문서에서 보존된 문서.
↑ J. Beel, S. Langer, and B. Gipp, “The Architecture and Datasets of Docear’s Research Paper Recommender System,” in Proceedings of the 3rd International Workshop on Mining Scientific Publications (WOSP 2014) at the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014), London, UK, 2014
↑ Perner, Petra (2007). 《Machine Learning and Data Mining in Pattern Recognition: 5th International Conference》. Springer. 387쪽. ISBN 978-3-540-73498-7.
↑ “The Future of Compass & Elasticsearch”. 《the dude abides》 (영어). 2015년 10월 15일에 원본 문서에서 보존된 문서. 2015년 10월 14일에 확인함.
↑ Wayner, Peter. “11 cutting-edge databases worth exploring now”. InfoWorld. 21 September 2015에 원본 문서에서 보존된 문서. 21 September 2015에 확인함.
↑ “Elasticsearch: RESTful, Distributed Search & Analytics - Elastic”. 《elastic.co》. 8 October 2015에 원본 문서에서 보존된 문서. 23 September 2015에 확인함.
↑ Marvin Humphrey. “KinoSearch - Search engine library. - metacpan.org”. 《p3rl.org》. 2015년 9월 23일에 확인함.
↑ ^가 ^나 Natividad, Angela. “Socialtext Updates Search, Goes Kino”. CMS Wire. 2012년 9월 29일에 원본 문서에서 보존된 문서. 2011년 5월 31일에 확인함.
↑ Diment, Kieren; Trout, Matt S (2009). 〈Catalyst Cookbook〉. 《The Definitive Guide to Catalyst》. Apress. 280쪽. ISBN 978-1-4302-2365-8.
↑ “HMDB: a knowledgebase for the human metabolome”. 《Nucleic Acids Res.》 37 (Database issue): D603–10. January 2009. doi:10.1093/nar/gkn810. PMC 2686599. PMID 18953024.
↑ “T3DB: a comprehensively annotated database of common toxins and their targets”. 《Nucleic Acids Res.》 38 (Database issue): D781–6. January 2010. doi:10.1093/nar/gkp934. PMC 2808899. PMID 19897546.
↑ “PoweredBy”. 《apache.org》. 21 September 2015에 원본 문서에서 보존된 문서. 23 September 2015에 확인함.
↑ MG Siegler. “Twitter Quietly Launched A New Search Backend Weeks Ago”. 《TechCrunch》. AOL. 25 September 2015에 원본 문서에서 보존된 문서. 23 September 2015에 확인함.

외부 링크

(영어) 공식 웹사이트

[1] “Welcome to Apache Lucene”. Lucene™ News section. 2020년 2월 12일에 원본 문서에서 보존된 문서. 2020년 2월 12일에 확인함.

[2] “LuceneImplementations”. apache.org. 2015년 9월 23일에 확인함.

[3] KeywordAnalyzer “Better Search with Apache Lucene and Solr” (PDF). 19 November 2007. 31 January 2012에 원본 문서 (PDF)에서 보존된 문서.

[4] Barker, Deane (2016). 《Web Content Management》. O'Reilly. 233쪽. ISBN 1491908106.

[5] McCandless, Michael; Hatcher, Erik; Gospodnetić, Otis (2010). 《Lucene in Action, Second Edition》. Manning. 8쪽. ISBN 1933988177.

[6] “GNU/Linux Semantic Storage System” (PDF). 《glscube.org》. 2010년 6월 1일에 원본 문서 (PDF)에서 보존된 문서.

[7] “Apache Lucene - Query Parser Syntax”. 《lucene.apache.org》. 2017년 5월 2일에 원본 문서에서 보존된 문서.

[8] J. Beel, S. Langer, and B. Gipp, “The Architecture and Datasets of Docear’s Research Paper Recommender System,” in Proceedings of the 3rd International Workshop on Mining Scientific Publications (WOSP 2014) at the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014), London, UK, 2014

[9] Perner, Petra (2007). 《Machine Learning and Data Mining in Pattern Recognition: 5th International Conference》. Springer. 387쪽. ISBN 978-3-540-73498-7.

[10] “The Future of Compass & Elasticsearch”. 《the dude abides》 (영어). 2015년 10월 15일에 원본 문서에서 보존된 문서. 2015년 10월 14일에 확인함.

[11] Wayner, Peter. “11 cutting-edge databases worth exploring now”. InfoWorld. 21 September 2015에 원본 문서에서 보존된 문서. 21 September 2015에 확인함.

[12] “Elasticsearch: RESTful, Distributed Search & Analytics - Elastic”. 《elastic.co》. 8 October 2015에 원본 문서에서 보존된 문서. 23 September 2015에 확인함.

[test-13] Marvin Humphrey. “KinoSearch - Search engine library. - metacpan.org”. 《p3rl.org》. 2015년 9월 23일에 확인함.

[cmswire-14] 가 ^나 Natividad, Angela. “Socialtext Updates Search, Goes Kino”. CMS Wire. 2012년 9월 29일에 원본 문서에서 보존된 문서. 2011년 5월 31일에 확인함.

[catbook-15] Diment, Kieren; Trout, Matt S (2009). 〈Catalyst Cookbook〉. 《The Definitive Guide to Catalyst》. Apress. 280쪽. ISBN 978-1-4302-2365-8.

[16] “HMDB: a knowledgebase for the human metabolome”. 《Nucleic Acids Res.》 37 (Database issue): D603–10. January 2009. doi:10.1093/nar/gkn810. PMC 2686599. PMID 18953024.

[17] “T3DB: a comprehensively annotated database of common toxins and their targets”. 《Nucleic Acids Res.》 38 (Database issue): D781–6. January 2010. doi:10.1093/nar/gkp934. PMC 2808899. PMID 19897546.

[18] “PoweredBy”. 《apache.org》. 21 September 2015에 원본 문서에서 보존된 문서. 23 September 2015에 확인함.

[twitter-19] MG Siegler. “Twitter Quietly Launched A New Search Backend Weeks Ago”. 《TechCrunch》. AOL. 25 September 2015에 원본 문서에서 보존된 문서. 23 September 2015에 확인함.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]