34

Does anyone have a paper that explains how the Ckmeans.1d.dp algorithm works?

Or: what is the most optimal way to do k-means clustering in one-dimension?

Asiri Hewage
  • 577
  • 3
  • 16
Laciel
  • 367
  • 1
  • 3
  • 6
  • Google turns up the tech. report Knops, Maintz, Pluim & Viergever (2004), Optimal one-dimensional k-means clustering using dynamic programming from Utrecht University, which is not available online. Unfortunately, the C++ code of this module is very unreadable. +1 for an interesting question. – Fred Foo Oct 23 '11 at 22:30
  • I think this is the paper you're looking for: [**Ckmeans.1d.dp: Optimal *k*-means Clustering in One Dimension by Dynamic Programming** by Haizhou Wang and Mingzhou Song](http://journal.r-project.org/archive/2011-2/RJournal_2011-2_Wang+Song.pdf). – Christoph Hösler Jul 09 '12 at 15:22

1 Answers1

8

Univariate k-means clustering can be solved in O(kn) time (on already sorted input) based on theoretical results on Monge matrices, but the approach was not popular most likely due to numerical instability and also perhaps coding challenges.

A better option is an O(knlgn) method that is now implemented in Ckmeans.1d.dp version 3.4.6. This implementation is as fast as heuristic k-means but offers guaranteed optimality, orders of magnitude better than heuristic k-means especially for large k's.

The generic dynamic programming solution by Richard Bellman (1973) does not touch upon specifics of the k-means problem and the implied runtime is O(kn^3).

user6417312
  • 81
  • 1
  • 1