2

I have a question about clustering. When you're using k-nearest neighbour algorithm, you have to say, how many clusters you're expecting. My problem is now, that I have some runs, where the number of clusters varies. I checked, and there are some methods how you can restrict, how many clusters you have, but these algorithms work for a two-dimensional problem. In my case, I have three features. Do you have an idea, of what algorithms I can use for a three-dimensional problem? I would be pleased if someone could help me because I also did some research by myself and I could not find anything. :)

Here for example it should locate two clusters, the one single point and the data row as the second cluster: first example

Here for example the second example, here I'm expectation the algorithm can find automatically three clusters, the long line, the short line and the single point: second example

Thanks. :)

Lysapala
  • 80
  • 9

1 Answers1

1

As @ForceBru said in the comment you can use the k-means algorithm also for 3D data. I always use the sklearn.cluster.KMeans class when I have to deal with 3D points to cluster.

Take also a look at this link where you can find a simple example to get started:

enter image description here

The key part of the example provided in the link above is the following:

from sklearn.cluster import KMeans
from sklearn import datasets

np.random.seed(5)

iris = datasets.load_iris()
X = iris.data
y = iris.target

estimators = [
    ("k_means_iris_8", KMeans(n_clusters=8)),
    ("k_means_iris_3", KMeans(n_clusters=3)),
    ("k_means_iris_bad_init", KMeans(n_clusters=3, n_init=1, init="random")),
]

You can also try to use the DBSCAN algorithm (but I am not an expert with it). Take a look here.

EDIT

I studied a little bit the DBSCAN algorithm from the sklearn.cluster library and I have also found an interesting SO answer here. So, when the number of cluster is not known a-priori you can do something like this (I have tried to reproduce your input):

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN

data = np.array(
    [[0,0,0], [1,1,1], [2,2,2], [3,3,3], [4,4,4], [5,5,5], [20, 20, 20]]
)

model = DBSCAN(eps=2.5, min_samples=2)
model.fit_predict(data)
pred = model.fit_predict(data)

fig = plt.figure()
ax = plt.axes(projection='3d')

ax.scatter(data[:,0], data[:,1], data[:,2], c=model.labels_, s=20)
plt.show()

print("number of cluster found: {}".format(len(set(model.labels_))))
print('cluster for each point: ', model.labels_)

Here is what I get from the code above: enter image description here

Try to study the DBSCAN parameters from the documentation and then adjust them to meet your goals.

Finally, here is a tons of other clustering algorithms, take a look at it!

Hope it helps!

blunova
  • 2,122
  • 3
  • 9
  • 21
  • 1
    Thanks for your answer, but as I can see in the code, you also have to provide, how many clusters you're expecting. The problem in my case is, that I don't know, how many clusters I have. In a normal 2D-Problem, you can get the number of clusters for example with the elbow method or average-silhouette method, but not for a three-dimensional problem. So the problem in my case is I first have to figure out, how I can calculate n_clusters in your example code. – Lysapala May 19 '22 at 09:54
  • You are welcome! Sorry, I was forgetting the DBSCAN algorithm always from `sklearn`. Take a look [here](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html). For the applications I have to deal with, I rarely use the DBSCAN so I do not know it very well. Hope it helps! I have also edited my answer. – blunova May 19 '22 at 10:00
  • I am doing additional research and I have also found this [link](https://plotly.com/python/v3/3d-point-clustering/) from Plotly. Seems also useful for your application. Let me know if it helps! :) – blunova May 19 '22 at 10:05
  • Somehow it doesn't work. It's clustering nothing. Maybe it would be helpful to show, what the plots look like. I'm posting in my question the kind of plots I get and I want to cluster. – Lysapala May 19 '22 at 10:57
  • Which algorithm does not work? – blunova May 19 '22 at 11:02
  • This one from plotly. The pictures I postet are from the algorithm. – Lysapala May 19 '22 at 12:18
  • I have added and edit to my question using the DBSCAN algorithm when the number of clusters is unknown and that should be what you are looking for. – blunova May 20 '22 at 15:07
  • 1
    Yes I also tried DBSCAN and this is definitely the right algorithm. I‘m going to figure out what epsilon I need maybe with a norm. Thank you all for your help, you guys are awesome. :) – Lysapala May 21 '22 at 22:32