----------------------------------------------------------------------------- Prescriptive Data Cleaning ----------------------------------------------------------------------------- Paolo Papotti Qatar Computing Research Institute ----------------------------------------------------------------------------- Venerdi' 12 giugno 2015 - Ore 14:30 Sala Riunioni Dipartimento di Ingegneria - Sezione di Informatica e Automazione Universita' Roma Tre Via Vasca Navale, 79 - 00146 Roma ----------------------------------------------------------------------------- Data cleaning techniques usually rely on some quality rules to identify violating tuples, and then fix these violations using some repair algorithms. Oftentimes, the rules, which are related to the business logic, can only be defined on some target report (view) generated by transformations over multiple data sources. This creates a situation where the violations detected in the report are decoupled in space and time from the actual source of errors. In addition, applying the repair on the report would need to be repeated whenever the data sources change. Finally, even if repairing the report is possible and affordable, this would be of little help towards identifying and analyzing the actual sources of errors for future prevention of violations at the target. In this talk, we present a system to address this decoupling. The system takes quality rules defined over the output of a transformation and computes explanations of the errors seen on the view. This is performed both at the view level to describe these errors and at the source level to prescribe actions to solve them. We present scalable techniques to detect, propagate, and explain errors. We also study the effectiveness and efficiency of our techniques by using the TPC-H Benchmark and real-world datasets for different classes of quality rules. -----------------------------------------------------------------------------