A Revival of Data Dependencies for Improving Data Quality
Wednesday 14th January 2009, 6:30 pm
Speaker: Professor Wenfei Fan, School of Informatics, The University of Edinburgh.
Venue: Room G.07, University of Edinburgh Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB - map (click on Informatics Forum in the list of buildings).
This talk is free of charge. Refreshments available from 6:00 pm.
This is a repeat of Professor Fan's BCS Roger Needham Lecture 2008.
Recent statistics reveal that 1%-5% of real-world data in enterprises is dirty: inconsistent, inaccurate, incomplete and/or stale.
The prevalent use of Internet has been increasing the risks, in an unprecedent scale, of creating and propagating dirty data. Dirty data is estimated to cost US industry alone billions of dollars a year.
There is no reason to believe that the scale of the problem is any different in the UK, or in any other society that is dependent on information technology. This highlights the need for principled approaches to improving data quality.
This talk presents a recent approach for detecting and repairing real-life dirty data. It is based on conditional dependencies, a revision of database dependencies by enforcing bindings of semantically related data values.
As opposed to traditional database dependencies that were developed for improving the quality of schema, conditional dependencies provide a theory for improving the quality of the data.
Based on the theory practical techniques have been developed for cleaning dirty data, which effectively reduce human efforts and improve data quality. The techniques have drawn attention from industries in the UK and beyond.
About the speaker