Precomputed distance matrix in DBSCAN

Question

Reading around, I find it is possible to pass a precomputed distance matrix into SKLearn DBSCAN. Unfortunately, I don't know how to pass it for calculation.

Say I have a 1D array with 100 elements, with just the names of the nodes. Then I have a 2D matrix, 100x100 with the distance between each element (in the same order).

I know I have to call it:

db = DBSCAN(eps=2, min_samples=5, metric="precomputed")

For a distance between nodes of 2 and a minimum of 5 node clusters. Also, use "precomputed" to indicate to use the 2D matrix. But how do I pass the info for the calculation?

The same question could apply if using RAPIDS CUML DBScan function (GPU accelerated).

warped · Answer 1 · 2020-07-02T14:28:17.013

8

documentation:

class sklearn.cluster.DBSCAN(eps=0.5, *, min_samples=5, metric='euclidean', 
metric_params=None, algorithm='auto', leaf_size=30, p=None, n_jobs=None)
[...]

[...]
metricstring, or callable, default=’euclidean’
The metric to use when calculating distance between instances in a feature array. If 
metric is a string or callable, it must be one of the options allowed by 
sklearn.metrics.pairwise_distances for its metric parameter. If metric is 
“precomputed”, X is assumed to be a distance matrix and must be square. X may be a 
Glossary, in which case only “nonzero” elements may be considered neighbors for  
DBSCAN.
[...]

So, the way you normally call this is:

from sklearn.cluster import DBSCAN

clustering = DBSCAN()
DBSCAN.fit(X)

if you have a distance matrix, you do:

from sklearn.cluster import DBSCAN

clustering = DBSCAN(metric='precomputed')
clustering.fit(distance_matrix)

edited Jul 02 '20 at 14:28

answered Jul 02 '20 at 12:03

warped

8,947
3
22
49

2

So X will be the square matrix with the distances pairwise, right? Shouldn't be metric='precomputed' instead of metric='fixed' ? – Jaime Nebrera Jul 02 '20 at 14:25
could you please review the details I give bellow? Thx – Jaime Nebrera Jul 06 '20 at 10:16
True, X should be pairwise distance matrix, and the metric should be 'precomputed'. Answer from @warped is true. – researchcollege111 Nov 15 '22 at 15:36

Precomputed distance matrix in DBSCAN

1 Answers1