K-means with initialization

alobo · January 23, 2019, 1:17pm

It would be nice if K-means could be initialized to user-defined centroids, as in most
implementations. In case one of the classes is distinct but rare (i.e., objects on the sea surface), the sampling strategy for the initial centroids is very inefficient.

e.g., in R:
kmeans(x, centers, iter.max = 10, nstart = 1, algorithm = c(“Hartigan-Wong”, “Lloyd”, “Forgy”, “MacQueen”), trace=FALSE) ## S3 method for class ‘kmeans’ fitted(object, method = c(“centers”, “classes”), …)

Arguments

`x`	numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).
`centers`	either the number of clusters, say k , or a set of initial (distinct) cluster centres. If a number, a random set of (distinct) rows in `x` is chosen as the initial centres.

jmichel · January 24, 2019, 9:26am

Thanks, can you open a feature request on gitlab ?

With template feature request.