Crawler.NET: A component-based distributed framework for web traversal

Abstract. In web search engines, collecting source documents is an indispensable function to be performed periodically at high speeds. For this end, an extensible, component-based, loosely-coupled distributed architecture for the .NET platform is presented that facilitates efficient parallel crawling. It combines flexibility key to research with scalability, which is a must for high performance. The architecture comprises of a lower layer that constitutes an execution environment and an upper layer that realizes a distributed crawler with a central coordinator.

Published in

  • Proceedings of microCAD 2007, section Applied information technology (section M) PDF Presentation PDF
  • Hungarian Students' Scientific Conference (TDK), Budapest University of Technology and Economics (BME), Faculty of Electrical Engineering and Information Technology (VIK), November 17, 2006, Software section PDF Presentation PDF

 

Co-developed with Péter Pallos.

Availability. The project is is available for download at SourceForge.