How to get the centroids in DBSCAN sklearn?

Question

I am using DBSCAN for clustering. However, now I want to pick a point from each cluster that represents it, but I realized that DBSCAN does not have centroids as in kmeans.

However, I observed that DBSCAN has something called core points. I am thinking if it is possible to use these core points or any other alternative to obtain a representative point from each cluster.

I have mentioned below the code that I have used.

import numpy as np
from math import pi
from sklearn.cluster import DBSCAN

#points containing time value in minutes
points = [100, 200, 600, 659, 700]

def convert_to_radian(x):
    return((x / (24 * 60)) * 2 * pi)

rad_function = np.vectorize(convert_to_radian)
points_rad = rad_function(points)

#generate distance matrix from each point
dist = points_rad[None,:] - points_rad[:, None]

#Assign shortest distances from each point
dist[((dist > pi) & (dist <= (2*pi)))] = dist[((dist > pi) & (dist <= (2*pi)))] -(2*pi)
dist[((dist > (-2*pi)) & (dist <= (-1*pi)))] = dist[((dist > (-2*pi)) & (dist <= (-1*pi)))] + (2*pi) 
dist = abs(dist)

#check dist
print(dist)

#using default values, set metric to 'precomputed'
db = DBSCAN(eps=((100 / (24*60)) * 2 * pi ), min_samples = 2, metric='precomputed')

#check db
print(db)

db.fit(dist)

#get labels
labels = db.labels_

#get number of clusters
no_clusters = len(set(labels)) - (1 if -1 in labels else 0)

print('No of clusters:', no_clusters)
print('Cluster 0 : ', np.nonzero(labels == 0)[0])
print('Cluster 1 : ', np.nonzero(labels == 1)[0])

print(db.core_sample_indices_)

I am happy to provide more details if needed.

Just in case you don't know: Kmeans is a centroid-based method (each cluster is just a centroid and all points belong to the nearest centroid). DBSCAN is density-based, so the resulting clusters can have any shape, as long as there are points close enough to each other. So DBSCAN could also result in a "ball"-cluster in the center with a "circle"-cluster around it. Both clusters would have the same "centroid" in that case, which is the reason why computing centroids for DBSCAN results can be highly misleading. So take care when working with those centroids (or use a centroid-based method). — Niklas Mertsch, Jun 06 '20 at 08:39

score 6 · Accepted Answer · answered Jun 05 '20 at 14:20

6

Why don't you estimate the centroids of the resulted estimated clusters?

points_of_cluster_0 = dist[labels==0,:]
centroid_of_cluster_0 = np.mean(points_of_cluster_0, axis=0) 
print(centroid_of_cluster_0)

points_of_cluster_1 = dist[labels==1,:]
centroid_of_cluster_1 = np.mean(points_of_cluster_1, axis=0)
print(centroid_of_cluster_1)

answered Jun 05 '20 at 14:20

seralouk

30,938
9
118
133

It produces wrong results for latitude and longitude values. – Augusto Maillo Nov 23 '21 at 13:44
what do you mean? in which dataset/question? – seralouk Nov 23 '21 at 15:16
Think about two points: 0.0, -179.0 and 0.0, 179.0. The centroid of these is 0.0, 0.0, which is very distant from them. – Augusto Maillo Nov 24 '21 at 14:08
Oh yes. my answer is about euclidean coordinates. You need to find another way for GPS coordinates. Convert them to other systems. best – seralouk Nov 25 '21 at 08:47

score 0 · Answer 2 · answered Jul 27 '22 at 01:22

0

Maybe, do KDE row by row like (e.g. density_i = np.where(cdist(x[i:i+1],x[inds])-cut_off<0,1,0).sum(1)) for each cluster {i.e., i in inds, where inds=np.argwhere(cluster_results==cluster_index)} and find the point with highest density in each cluster; that is the most representative centroid. This may still can be slow if dataset is massive.

answered Jul 27 '22 at 01:22

Chonk

11
2

NB; as mentioned in above comment; non Euclidean dataset q needs to first be represented/featurized/mapped to Euclidean coordinate system x:=map(q), even before going into DBSCAN. [In terms of the two GPS coordinates one of those (around the equator one) is mapped to a 2D circle (by [sin,cos](q[:,0])) and the other one (north to south) probably to a semi-circle (by [cos](q[:,1])), so x is 3D.] – Chonk Jul 27 '22 at 01:44

How to get the centroids in DBSCAN sklearn?

2 Answers2