Nutch |
Nutch is an effort to build an open source search engine. It uses Lucene for the search and index component. The fetcher (Web crawler) has been written from scratch solely for this project.
Nutch has a highly modular architecture allowing developers to create plugins for the following activities: media-type parsing, data retrieval, querying and clustering.
Tim O Reilly has a seat in Nutch s board of directors.
Doug Cutting is the lead developer.
As of June 2005, Nutch has graduated from the Apache Incubator, and is now a subproject of Lucene.
It is completely coded in Java programming language, but data is written in language-independent formats. In June 2003 there was a successful 100 million page demo system.
=External links=
*[http://lucene.apache.org/nutch/ Official page of the project] *[http://www.lucene-consulting.com Nutch & Lucene Consulting] from Otis Gospodneti, Lucene developer and Lucene in Action co-author. *[http://www.objectssearch.com/services/index.html Nutch/Lucene Consultant] Offers Nutch/Lucene based solutions. *[http://www.acmqueue.com/modules.phpname=Content&pa=showpage&pid=144&page=1 Building Nutch: Open Source Search] (2004) - ACM Queue vol. 2, no. 2 *[http://searchenginewatch.com/searchday/article.php/3071971 An article about Nutch] (2003) - Search Engine Watch *[http://www.technewsworld.com/perl/story/31653.html Another article about Nutch] (2003) - Tech News World *[http://wiki.media-style.com/display/nutchDocu/Home non official Documentation]|
|