Lunedi' 4 maggio 2015
ore 11.00 - 12.30
aula N4
Inferencing in Information Extraction: Techniques and Applications (part I)
Prof. Denilson Barbosa (University of Alberta)

Abstract: Information extraction at Web scale has become one of the most important research topics in data management since major commercial search engines started incorporating knowledge in their search results a couple of years ago . Users increasingly expect structured knowledge as answers to their search needs. Using Bing as an example, the result page for “Lionel Messi” is full of structured knowledge facts, such as his birthday and awards. The research efforts towards improving the accuracy and coverage of such knowledge bases have led to significant advances in Information Extraction techniques. As the initial challenge of accurately extracting facts for popular entities are being addressed, more difficult challenges have emerged such as extending knowledge coverage to long tail entities and domains, understanding interestingness and usefulness of facts within a given context, and addressing informationseeking needs more directly and accurately. In this tutorial, we will survey the recent research efforts and provide an introduction to the techniques that address those challenges, and the applications that benefit from the adoption of those techniques. In particular, this tutorial will focus on a variety of techniques that can be broadly viewed as knowledge inferencing, i.e., combining multiple data sources and extraction techniques to verify existing knowledge and derive new knowledge. More specifically, we focus on four main categories of inferencing techniques: 1) deep natural language processing using machine learning techniques, 2) data cleaning using integrity constraints, 3) large-scale probabilistic reasoning, and 4) leveraging human expertise for domain knowledge extraction.


Bio: Denilson Barbosa is an Associate Professor and the Director of the Science Internship Program in theDepartment of Computing Science, University of Alberta. He received a PhD from the University of Toronto (2005), working on Web data management.

Since then he has worked on databases, the Web, information retrieval and natural language processing, with recent emphasis on information extraction from semi-structured and unstructured data, having published widely in the top conferences and journals in these areas.

He has served as Program Committee member of all top database and Web conferences on several occasions, and as Program co-Chair of the 2015 Canadian AI Conference, the 1st and 2nd ACM SIGMOD Workshop on Databases and Social Networks (co-located with SIGMOD 2011 and 2012), the 3rd International Workshop on Data Engineering Meets the Semantic Web (co-located with ICDE 2012), and the 5th International XML Database Symposium (co-located with VLDB 2007). He also served as an Associate Editor of SIGMOD Record from 2010 to 2014 and as the ACM SIGMOD Information Director and Web Editor of the Record from 2006 to 2012.

He is the recipient an Alberta Ingenuity New Faculty Award, an IBM Faculty Award, the Best Paper Award at the 2010 IEEE Conference on Data Engineering, and he supervised the recipients of the Best Undergraduate Poster Award at the 2012 ACM SIGMOD Conference. He was a Visiting Scientist (Gastewissenchaftler) at the Max-Planck Institute for Informatics, Germany from July 2014 to April 2015, and a Visiting Professor (BIT) at the Free University of Bozen-Bolzano, Italy, during the Summer of 2008.

He was a principal investigator and the Leader of the Data Quality Theme of the NSERC Business Intelligence Network.