Universita' Roma Tre CCS Ingegneria Informatica Dip Informatica e Automazione
Sistemi informativi (nuovo ordinamento)
Anno accademico 2004-2005
Home page del corso Programma e materiale Avvisi

(nell'ambito della lezione del corso di Sistemi informativi, aperto a tutti gli interessati)

Mercoledì 15 dicembre 2004, ore 14 aula N3

Logical Data Expiration
Prof David Toman
University of Waterloo (Canada)



Data expiration is an essential component of data warehousing solutions: whenever large amounts of data are repeatedly collected over a period of time, it is essential to have a clear approach to identifying parts of the data no-longer needed and a policy that allows disposing and/or archiving these parts of the data. Such policies are necessary even if adding storage to accommodate an ever-growing collection of data were possible, since the growing amount of data needs to be examined during querying and in turn leads to deterioration of query performance over time. Approaches to data expiration range from ad-hoc administrative policies or regulations to sophisticated data analysis-based techniques. The approaches have, however, one thing in common: intuitively, they try to identify the parts of the data collection that are not needed in the future. The key to deciding if a piece of information will be needed in the future lies in identifying what queries can be asked over the collection of data and how the collection can evolve from its current state. The various techniques proposed in the literature differ in the way they identify parts of data no longer needed. This talk formalizes the notion of data expiration in terms of how the data is used to answer queries. We survey existing approaches to the problem in a unified framework and discuss their features and limits, and the limits of data expiration based techniques in general. The particular focus of the chapter is on comparing the space performance of various data expiration methods. Interestingly, the methods developed for data expiration are almost directly applicable to processing standing queries over data streams and to construction of synopses.

Paolo Atzeni, Dipartimento di Informatica e Automazione, Università Roma Tre
Ultima modifica 13/12/2004