Abstract. In web search engines, collecting source documents is an indispensable function to be performed periodically at high speeds. For this end, an extensible, component-based, loosely-coupled distributed architecture for the .NET platform is presented that facilitates efficient parallel crawling. It combines flexibility key to research with scalability, which is a must for high performance. The architecture comprises of a lower layer that constitutes an execution environment and an upper layer that realizes a distributed crawler with a central coordinator.
Published in
- Proceedings of microCAD 2007, section Applied information technology (section M) PDF Presentation PDF
- Hungarian Students' Scientific Conference (TDK), Budapest University of Technology and Economics (BME), Faculty of Electrical Engineering and Information Technology (VIK), November 17, 2006, Software section PDF Presentation PDF
Co-developed with Péter Pallos.
Availability. The project is is available for download at SourceForge.