6

I was learning about non-linear clustering algorithms and I came across this 2-D graph. I was wondering which clustering alogirthm and combination of hyper-parameters will cluster this data well.

Plot

Just like a human will cluster those 5 spikes. I want my algorithm to do it. I tried KMeans but it was only clustering horizontly or vertically. I started using GMM but couldn't get the hyper-parameters right for the desired clustering.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
rrm_2016
  • 63
  • 4

3 Answers3

3

If it doesn't work, always try to improve the preprocessing first. Algorithms such as k-means are very sensitive to scaling, so that is something that needs to be chosen carefully.

GMM is clearly your first choice here. It may be worth trying out different tools. R's Mclust is very slow. Sklearn's GMM is sometimes unstable. ELKI is a bit harder to get started with, but its EM gave me the best results usually.

Apart from GMM, it likely is worth trying out correlation clustering. These algorithms assume there is some manifold (e.g., a line) on which a cluster exists. Examples include ORCLUS, LMCLUS, CASH, 4C, ... But in my opinion these mostly work for synthetic toy data.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
1

I will suggest trying out hierarchical clustering. In the Agglomerative approach, you will assign individual clusters to each point, and then combine clusters based on their distances from each other.

Abhineet Gupta
  • 624
  • 4
  • 12
1

DBSCAN or GMM should work well to cluster this type of data.

It is one of the few clustering algorithms that does not classify the data into circular clusters

Clustering with DBSCAN

DBSCAN

Clustering with GMM

GMM

Also please do give this blog a read. It will explain the different clustering techniques.

skillsmuggler
  • 1,862
  • 1
  • 11
  • 16