On Sampling Based Algorithms for k-Means

Medientyp: Elektronischer Konferenzbericht; Sonstige Veröffentlichung; E-Artikel

Titel: On Sampling Based Algorithms for k-Means

Beteiligte: Bhattacharya, Anup [VerfasserIn]; Goyal, Dishant [VerfasserIn]; Jaiswal, Ragesh [VerfasserIn]; Kumar, Amit [VerfasserIn]

Erschienen: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2020

Sprache: Englisch

DOI: https://doi.org/10.4230/LIPIcs.FSTTCS.2020.13

Schlagwörter: low rank approximation ; k-means

Entstehung:

Anmerkungen: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.

Beschreibung: We generalise the results of Bhattacharya et al. [Bhattacharya et al., 2018] for the list-k-means problem defined as - for a (unknown) partition X₁, ., X_k of the dataset X ⊆ ℝ^d, find a list of k-center-sets (each element in the list is a set of k centers) such that at least one of k-center-sets {c₁, ., c_k} in the list gives an (1+ε)-approximation with respect to the cost function min_{permutation π} [∑_{i = 1}^{k} ∑_{x ∈ X_i} ||x - c_{π(i)}||²]. The list-k-means problem is important for the constrained k-means problem since algorithms for the former can be converted to {PTAS} for various versions of the latter. The algorithm for the list-k-means problem by Bhattacharya et al. is a D²-sampling based algorithm that runs in k iterations. Making use of a constant factor solution for the (classical or unconstrained) k-means problem, we generalise the algorithm of Bhattacharya et al. in two ways - (i) for any fixed set X_{j₁}, ., X_{j_t} of t ≤ k clusters, the algorithm produces a list of (k/(ε))^{O(t/(ε))} t-center sets such that (w.h.p.) at least one of them is good for X_{j₁}, ., X_{j_t}, and (ii) the algorithm runs in a single iteration. Following are the consequences of our generalisations: 1) Faster PTAS under stability and a parameterised reduction: Property (i) of our generalisation is useful in scenarios where finding good centers becomes easier once good centers for a few "bad" clusters have been chosen. One such case is clustering under stability of Awasthi et al. [Awasthi et al., 2010] where the number of such bad clusters is a constant. Using property (i), we significantly improve the running time of their algorithm from O(dn³) (k log{n})^{poly(1/(β), 1/(ε))} to O (dn³ (k/(ε)) ^{O(1/βε²)}). Another application is a parameterised reduction from the outlier version of k-means to the classical one where the bad clusters are the outliers. 2) Streaming algorithms: The sampling algorithm running in a single iteration (i.e., property (ii)) allows us to design a constant-pass, logspace streaming algorithm for ...

Zugangsstatus: Freier Zugang

Nur in Feld suchen:

Zuletzt gesuchte Begriffe: