----------------------------------------------------------------------------- DATABASE SEMINAR ------------------------------------------------------------------------------ Paolo Papotti Qatar Computing Research Institute ------------------------------------------------------------------------------ NADEEF: A Generalized Data Cleaning System ------------------------------------------------------------------------------ Mercoledi', 11 settembre, 2013 -- h 10:00 **** Sala Riunioni **** Dipartimento di Ingegneria Universita' Roma Tre Via Vasca Navale, 79 piano piano ------------------------------------------------------------------------------ ABSTRACT Data cleaning is an important problem and data quality rules are the most promising way to face it with a declarative approach. Previous work has focused on specific formalisms, such as functional dependencies (FDs), conditional functional dependencies (CFDs), and matching dependencies (MDs), and those have always been studied in isolation. Moreover, such techniques are usually applied in a pipeline or interleaved. In this work we tackle the problem in a novel system, NADEEF. NADEEF is an extensible, generic and easy-to-deploy data cleaning system that distinguishes between a programming interface and a core to achieve generality and extensibility. The programming interface can be used to express many types of data quality rules beyond the well known CFDs, MDs and ETL rules. The core algorithms can interleave multiple types of rules to detect and repair data errors. Such holistic view of the conflicts is the starting point for a novel definition of repair context that allows us to compute repairs of better quality w.r.t. previous approaches in the literature. ------------------------------------------------------------------------------