1

I have a list of numbers in an array. The index of each element is X and the value is Y. How do i go about partitioning/clustering this data? If i had an array, i just want a set of values which mark the end of each partition. Since I'm working on Python, please do mention if there are libraries to do the same.

Thanks.

Karthick
  • 4,456
  • 7
  • 28
  • 34
  • What's the data? What's your application? Are you sure you want clustering rather than segmenting? i.e. Do you want all points in a cluster to be contiguous X samples? This is what you'd usually do for a time series. – dimatura May 27 '11 at 06:53
  • possible duplicate of [not random clusters in 1D data set](http://stackoverflow.com/questions/5738490/not-random-clusters-in-1d-data-set) – Has QUIT--Anony-Mousse Feb 01 '13 at 07:42

1 Answers1

5

K-Means is a very simple clustering algorithm, I would say the first to test before going for more complex things. The K-Means algorithm http://en.wikipedia.org/wiki/K-means_clustering

Proper K-Means initialization is strongly advised http://en.wikipedia.org/wiki/K-means%2B%2B, as it.

If you're not happy with K-Means, then you use EM algorithm with Gaussian mix ( http://en.wikipedia.org/wiki/Mixture_model ), not too hard to code and you can use K-Means to initialize it !

Those have been implemented 100 times in Python, check any machine learning toolbox.

Monkey
  • 1,838
  • 1
  • 17
  • 24
  • 5
    SciPy has a very friendly implementation of kmeans in its cluster package. I was just using it today as a matter of fact, and I happen to have the docs in another tab right now: http://docs.scipy.org/doc/scipy/reference/cluster.vq.html – jscs May 27 '11 at 03:27
  • 2
    **Don't use k-means on 1-d data. Use optmized 1-d techniques.** – Has QUIT--Anony-Mousse Feb 01 '13 at 07:41