|Scaling Hadoop to handle Big Data|
|SPEAKER||Shanmugam Senthil is leading Hadoop engineering team in Yahoo! Bangalore. He is working on building cloud infrastructure in Yahoo to handle large volume of data. He pioneered the idea of building large scale content processing platform using Hadoop and widely used within Yahoo!. He worked on several middleware platforms in Sun Microsystems to be global ready and has spent years to improve localization process. Shanmugam Senthil is a post-graduate (M.Tech) from IIT Kharagpur and holds Bachelor of Engineering degree from Thiagarajar College of Engineering Madurai.
Director, Yahoo India
|Talk Outline:||Hadoop is becoming de-facto platform to process big data. It has come a long way from running on 20 nodes prototype to 4000 node production cluster in just few years. HDFS (Hadoop Distributed File System) keeps all namespace and block locations in-memory thus limiting number of files and blocks addressable. Federated HDFS is a new feature to overcome such limitation by allowing multiple independent namespaces (and NameNodes) to share the physical storage within the cluster. Hadoop JobTracker schedules Map/Reduce jobs, co-ordinates messages between map and reduce jobs in a cluster. The Jobtracker is been re-architected to handle large size cluster, allow jobs that are not compliant with MapReduce framework. We will present high level design and the benefits behind these two big features.|