Accuracy in the Application of Statistical Matching Methods for Continuous Variables Using Auxiliary Data

Medientyp: E-Artikel

Titel: Accuracy in the Application of Statistical Matching Methods for Continuous Variables Using Auxiliary Data

Beteiligte: Van Delden, Arnout; Du Chatinier, Bart J; Scholtus, Sander

Erschienen: Oxford University Press (OUP), 2020

Sprache: Englisch

DOI: 10.1093/jssam/smz032

ISSN: 2325-0984; 2325-0992

Schlagwörter: Applied Mathematics ; Statistics, Probability and Uncertainty ; Social Sciences (miscellaneous) ; Statistics and Probability

Entstehung:

Anmerkungen:

Beschreibung: Abstract Statistical matching is a technique to combine variables in two or more nonoverlapping samples that are drawn from the same population. In the current study, the unobserved joint distribution between two target variables in nonoverlapping samples is estimated using a parametric model. A classical assumption to estimate this joint distribution is that the target variables are independent given the background variables observed in both samples. A problem with the use of this conditional independence assumption is that the estimated joint distribution may be severely biased when the assumption does not hold, which in general will be unacceptable for official statistics. Here, we explored to what extent the accuracy can be improved by the use of two types of auxiliary information: the use of a common administrative variable and the use of a small additional sample from a similar population. This additional sample is included by using the partial correlation of the target variables given the background variables or by using an EM algorithm. In total, four different approaches were compared to estimate the joint distribution of the target variables. Starting with empirical data, we show how the accuracy of the joint distribution is affected by the use of administrative data and by the size of the additional sample included via a partial correlation and through an EM algorithm. The study further shows how this accuracy depends on the strength of the relations among the target and auxiliary variables. We found that including a common administrative variable does not always improve the accuracy of the results. We further found that the EM algorithm nearly always yielded the most accurate results; this effect is largest when the explained variance of the separate target variables by the common background variables is not large.

Nur in Feld suchen:

Zuletzt gesuchte Begriffe: