Exploring the Gap Between Tolerant and Non-Tolerant Distribution Testing

Medientyp: Elektronischer Konferenzbericht

Titel: Exploring the Gap Between Tolerant and Non-Tolerant Distribution Testing

Beteiligte: Chakraborty, Sourav [VerfasserIn]; Fischer, Eldar [VerfasserIn]; Ghosh, Arijit [VerfasserIn]; Mishra, Gopinath [VerfasserIn]; Sen, Sayantan [VerfasserIn]

Erschienen: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2022), 2022

Sprache: Englisch

DOI: https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2022.27

Schlagwörter: Tolerant Testing ; Sample Complexity ; Non-tolerant Testing ; Distribution Testing ; Data processing Computer science

Entstehung:

Anmerkungen: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.

Beschreibung: The framework of distribution testing is currently ubiquitous in the field of property testing. In this model, the input is a probability distribution accessible via independently drawn samples from an oracle. The testing task is to distinguish a distribution that satisfies some property from a distribution that is far in some distance measure from satisfying it. The task of tolerant testing imposes a further restriction, that distributions close to satisfying the property are also accepted. This work focuses on the connection between the sample complexities of non-tolerant testing of distributions and their tolerant testing counterparts. When limiting our scope to label-invariant (symmetric) properties of distributions, we prove that the gap is at most quadratic, ignoring poly-logarithmic factors. Conversely, the property of being the uniform distribution is indeed known to have an almost-quadratic gap. When moving to general, not necessarily label-invariant properties, the situation is more complicated, and we show some partial results. We show that if a property requires the distributions to be non-concentrated, that is, the probability mass of the distribution is sufficiently spread out, then it cannot be non-tolerantly tested with o(?n) many samples, where n denotes the universe size. Clearly, this implies at most a quadratic gap, because a distribution can be learned (and hence tolerantly tested against any property) using ?(n) many samples. Being non-concentrated is a strong requirement on properties, as we also prove a close to linear lower bound against their tolerant tests. Apart from the case where the distribution is non-concentrated, we also show if an input distribution is very concentrated, in the sense that it is mostly supported on a subset of size s of the universe, then it can be learned using only ?(s) many samples. The learning procedure adapts to the input, and works without knowing s in advance.

Zugangsstatus: Freier Zugang

Rechte-/Nutzungshinweise: Namensnennung (CC BY)

Nur in Feld suchen:

Zuletzt gesuchte Begriffe: