University thesis:
Dissertation, Universität Köln, 2022
Footnote:
Description:
This thesis is a general work in the field of data science that cannot be assigned to any specific application context. Rather, contributions are made to several topics. Common to all contributions in this work is the focus on the statistical analysis of data sets by supervised learning. In total, the present work includes three independent essays: The second chapter corresponds to the paper 'Depth-based support vector classifiers to detect data nests of rare events' by Dyckerhoff and Stenz (2021) and designs a hybrid classification method: Instead of carrying out a classification directly on a data set with a binary target variable, the data is transferred to a DD-Plot (depth-versus-depth Plot) in a first step. The third chapter provides a generalization of the approach of supervised factor models. And the fourth and final chapter is a purely empirical project: A data mining analysis is performed on medical billing data with the aim of determining the possibility of algorithmic pre-screening for birth defects in newborn children. The analysis uses variants of the random forest as well as logistic regression.