Information extraction from webpages, social networks, news and user interactions crucially relies on inferring the hidden parameters of interaction between entities. For instance, in factorization models for movie recommendation we are interested in the underlying hidden properties of users and movies respectively such as to suggest new movies. Likewise, when extracting topics from webpages we want to find the hidden topics representing documents and words. Finally, when modeling user behavior it is worth while finding the latent factors, cluster variables, causes, etc. that drive a user's interaction with websites.
All these problems can be described in a coherent statistical framework. While much has been published about how to deal with these problems at moderate sizes, there is little information available on how to perform efficient scalable estimation at the scale of the internet. In this tutorial we present both the theory and algorithms for achieving these goals. In particular, we will describe inference algorithms for collaborative filtering, recommendation, latent dirichlet allocation, and advanced clustering models.