Efficiently Computing Inclusion Dependencies for Schema Discovery

Medientyp: Elektronischer Konferenzbericht

Titel: Efficiently Computing Inclusion Dependencies for Schema Discovery

Beteiligte: Bauckmann, Jana [VerfasserIn]; Leser, Ulf [VerfasserIn]; Naumann, Felix [VerfasserIn]

Erschienen: Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, 2006-04-01

Sprache: Englisch

DOI: https://doi.org/10.1109/ICDEW.2006.54; https://doi.org/10.18452/9215

Schlagwörter: Schema Management ; Metadata ; Profiling ; Data Integration

Entstehung:

Anmerkungen: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.

Beschreibung: Large data integration projects must often cope with undocumented data sources. Schema discovery aims at automatically finding structures in such cases. An important class of relationships between attributes that can be detected automatically are inclusion dependencies (IND), which provide an excellent basis for guessing foreign key constraints. INDs can be discovered by comparing the sets of distinct values of pairs of attributes. In this paper we present efficient algorithms for finding unary INDs. We first show that (and why) SQL is not suitable for this task. We then develop two algorithms that compute inclusion dependencies outside of the database. Both are much faster than the SQL-based methods; in fact, for larger schemas they are the only feasible solution. Our experiments show that we can compute all unary INDs in a schema of 1, 680 attributes with a total database size of 3.2 GB in approximately 2.5 hours. ; Peer Reviewed ; Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, 3-7 April 2006, 2006, pp 2-2, Second International Workshop on Database Interoperability (InterDB 06), Atlanta, Georgia, USA, 03.04.2006 - 07.04.2006

Zugangsstatus: Freier Zugang

Rechte-/Nutzungshinweise: Urheberrechtsschutz

Nur in Feld suchen:

Zuletzt gesuchte Begriffe: