Parallel execution of causal structure learning on graphics processing units ; Parallele Ausführung von kausalem Strukturlernen auf Grafikprozessoren

Medientyp: Elektronische Hochschulschrift; E-Book; Dissertation

Titel: Parallel execution of causal structure learning on graphics processing units ; Parallele Ausführung von kausalem Strukturlernen auf Grafikprozessoren

Beteiligte: Hagedorn, Christopher [VerfasserIn]

Erschienen: University of Potsdam: publish.UP, 2023

Sprache: Englisch

DOI: https://doi.org/10.25932/publishup-59758

Schlagwörter: Hasso-Plattner-Institut für Digital Engineering GmbH

Entstehung:

Anmerkungen: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.

Beschreibung: Learning the causal structures from observational data is an omnipresent challenge in data science. The amount of observational data available to Causal Structure Learning (CSL) algorithms is increasing as data is collected at high frequency from many data sources nowadays. While processing more data generally yields higher accuracy in CSL, the concomitant increase in the runtime of CSL algorithms hinders their widespread adoption in practice. CSL is a parallelizable problem. Existing parallel CSL algorithms address execution on multi-core Central Processing Units (CPUs) with dozens of compute cores. However, modern computing systems are often heterogeneous and equipped with Graphics Processing Units (GPUs) to accelerate computations. Typically, these GPUs provide several thousand compute cores for massively parallel data processing. To shorten the runtime of CSL algorithms, we design efficient execution strategies that leverage the parallel processing power of GPUs. Particularly, we derive GPU-accelerated variants of a well-known constraint-based CSL method, the PC algorithm, as it allows choosing a statistical Conditional Independence test (CI test) appropriate to the observational data characteristics. Our two main contributions are: (1) to reflect differences in the CI tests, we design three GPU-based variants of the PC algorithm tailored to CI tests that handle data with the following characteristics. We develop one variant for data assuming the Gaussian distribution model, one for discrete data, and another for mixed discrete-continuous data and data with non-linear relationships. Each variant is optimized for the appropriate CI test leveraging GPU hardware properties, such as shared or thread-local memory. Our GPU-accelerated variants outperform state-of-the-art parallel CPU-based algorithms by factors of up to 93.4× for data assuming the Gaussian distribution model, up to 54.3× for discrete data, up to 240× for continuous data with non-linear relationships and up to 655× for mixed discrete-continuous ...

Zugangsstatus: Freier Zugang

Rechte-/Nutzungshinweise: Namensnennung - Nicht-kommerziell - Keine Bearbeitung (CC BY-NC-ND)

Nur in Feld suchen:

Zuletzt gesuchte Begriffe: