SEMINAR SERIES Pierre Senellart Telecom ParisTech "Web Data Management" The Web has revolutioned access to information. This series of seminar will cover the theme of Web data management, i.e., how to acquire, properly model, store and analyze data from the World Wide Web. We will first give an overview of technologies used by Web search engines to crawl, index, and rank Web pages according to users' queries. This can only be done by using scalable approaches to distributed storage and computing, which will be presented in a second lecture about the MapReduce framework. Semantic Web technologies propose a workable model for representing semantic information from the Web, either explicitly annotated as such, or automatically extracted from existing resources; this will be the focus of the third lecture. Finally, we will conclude with an introduction to probabilistic data management, especially discussing probabilistic XML data models, which are well-suited to the representation, querying, and updating of the inherent uncertainty contained in Web data. Agenda: - 18 Sept 2013, Sala Riunioni Dia, 10:00 - 11:00 "Web Search" - 20 Sept 2013, Sala Riunioni Dia, 10:00 - 11:00 "MapReduce: Distributed Computing at Large Scale" - 23 Sept 2013, Sala Riunioni Dia, 10:00 - 11:00 "Semantic Web Technologies" - 26 Sept 2013, Sala Riunioni Dia, 10:00 - 11:00 "Probabilistic XML: A Data Model for the Web" References: - S. Abiteboul, I. Manolescu, M.-C. Rousset, P. Rigaux, and P. Senellart, Web Data Management. Cambridge University Press, New York, USA, January 2012. Freely available at http://webdam.inria.fr/Jorge/ - B. Kimelfeld and P. Senellart, Probabilistic XML: Models and Complexity. In Z. Ma and L. Yan, editors, Advances in Probabilistic Databases for Uncertain Information Management, pp. 39–66. Springer-Verlag, May 2013. Dr. Pierre Senellart is an Associate Professor in the DBWeb team at Télécom ParisTech, the French leading engineering school specializing in information technology. He is an alumnus of the École normale supérieure and obtained his M.Sc. (2003) and his Ph.D. (2007) in computer science from Université Paris-Sud, studying under the supervision of Serge Abiteboul. He was awarded an Habilitation à diriger les recherches in 2012 from Université Pierre et Marie Curie. Pierre Senellart has published articles in internationally renowned conferences and journals (PODS, AAAI, VLDB Journal, Journal of the ACM, etc.) He has been a member of the program committee and participated in the organization of various international conferences and workshops (including PODS, WWW, VLDB, SIGMOD, ICDE). He is also the Information Director of the Journal of the ACM. His research interests focus around theoretical aspects of database management systems and the World Wide Web, and more specifically on the intentional indexing of the deep Web, probabilistic XML databases, and graph mining. He also has an interest in natural language processing, and has been collaborating with SYSTRAN, the leading machine translation company.