• Medientyp: Sonstige Veröffentlichung; E-Artikel
  • Titel: PatchIndex: exploiting approximate constraints in distributed databases
  • Beteiligte: Kläbe, Steffen [VerfasserIn]; Sattler, Kai-Uwe [VerfasserIn]; Baumann, Stephan [VerfasserIn]
  • Erschienen: Digital Library Thüringen, 2021-03-06
  • Sprache: Englisch
  • DOI: https://doi.org/10.1007/s10619-021-07326-1
  • Schlagwörter: Self-managing databases -- Distributed databases -- Schema refinement -- Approximate constraints -- Uniqueness -- Patch processing ; ScholarlyArticle ; article
  • Entstehung:
  • Anmerkungen: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
  • Beschreibung: Cloud data warehouse systems lower the barrier to access data analytics. These applications often lack a database administrator and integrate data from various sources, potentially leading to data not satisfying strict constraints. Automatic schema optimization in self-managing databases is difficult in these environments without prior data cleaning steps. In this paper, we focus on constraint discovery as a subtask of schema optimization. Perfect constraints might not exist in these unclean datasets due to a small set of values violating the constraints. Therefore, we introduce the concept of a generic PatchIndex structure, which handles exceptions to given constraints and enables database systems to define these approximate constraints. We apply the concept to the environment of distributed databases, providing parallel index creation approaches and optimization techniques for parallel queries using PatchIndexes. Furthermore, we describe heuristics for automatic discovery of PatchIndex candidate columns and prove the performance benefit of using PatchIndexes in our evaluation.
  • Zugangsstatus: Freier Zugang