logo Conference Theme

Official Airline
Tutorial :
Distributed Web Retrieval
Tutorial id Invited2
Tutorial name Distributed Web Retrieval
Presenter • Ricardo Baeza-Yates
  Yahoo! Research
Barcellona, Spain
  View Tutorial


In the ocean of Web data, Web search engines are the pri mary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters. Web data, however, is always evolving. The number of Web sites continues to grow rapidly (over 270 millions at the beginning of 2011) and there are currently more than 20 billion indexed pages. On the other hand, Internet users are above one billion and hun dreds of million of queries are issued each day. In the near future, centralized systems are likely to become less e ective against such a data-query load, thus suggesting the need of fully distributed search engines. Such engines need to main tain high quality answers, fast response time, high query throughput, high availability and scalability; in spite of net work latency and scattered data. In this tutorial we present the architecture of current search engines and we explore the main challenges behind the design of all the processes of a distributed Web retrieval system crawling, indexing, and query processing.


IIIT Bangalore
In Association With

Quick Links
Face Book

facebook linkedin twitter Valid XHTML 1.0 Transitional