-1

I have an array of coordinate data (in Web Mercator Eastings and Northings, thus in metres) that looks like this:

array([[ -232372.201264,  6785082.61011 ],
   [ -233396.451899,  6784865.49884 ],
   [ -234045.110572,  6784642.2575  ],
   ..., 
   [ -234473.356653,  6778646.81953 ],
   [ -234918.300657,  6778772.69366 ],
   [ -230900.668915,  6778369.2902  ]])

This array is stored as the variable 'coords'.

I am attempting to compute - and then plot - the clusters within this dataset using Scikit Learn and DBSCAN (thanks to this post for getting me this far).

The code I am using is taken from this tutorial, however I get an attribute error. Code and error shown below:

db = DBSCAN(eps=0.2, min_samples=1, metric="precomputed")
cluster_labels = db.labels_
num_clusters = len(set(cluster_labels))
clusters = pd.Series([coords[cluster_labels == n] for n in range(num_clusters)])
print('Number of clusters: {}'.format(num_clusters))

...

AttributeError: 'DBSCAN' object has no attribute 'labels_'

Can anyone explain where I'm going wrong?

Community
  • 1
  • 1
the_bonze
  • 325
  • 1
  • 4
  • 11
  • what version of sklearn are you using? – Grr Apr 25 '17 at 12:15
  • @Grr I'm using v0.18.1 – the_bonze Apr 25 '17 at 12:19
  • Web Mercator is **not in meters** but in pixels at the given zoom level? Also, it does not work at the 180 degree line... and you get substantial error because of the distortion. – Has QUIT--Anony-Mousse Apr 25 '17 at 18:51
  • Pick two cities east-west of each other, e.g., New York and San Francisco, and check their distance! – Has QUIT--Anony-Mousse Apr 25 '17 at 19:05
  • @Anony-Mousse the Eastings and Northings, which I was referring to, are values in metres: https://epsg.io/3857 – the_bonze Apr 27 '17 at 06:57
  • OK, so it seems to be the zoom level set to match the equator. Did you try the distance I mentioned, to get a feeling of how big your errors are? I'd rather not rely on the eastings unless close to the equator. Because I'd assume you get a distance some 1.4x as large as the *actual* distance, and that is a quite substantial error. Mercator projections are unsuitable for distances. – Has QUIT--Anony-Mousse Apr 27 '17 at 07:37

2 Answers2

4

you are missing fit:

db = DBSCAN(eps=0.2, min_samples=1, metric="precomputed")
db.fit(data)
cluster_labels = db.labels_
num_clusters = len(set(cluster_labels))
clusters = pd.Series([coords[cluster_labels == n] for n in range(num_clusters)])
print('Number of clusters: {}'.format(num_clusters))
Abhishek Thakur
  • 16,337
  • 15
  • 66
  • 97
  • Thanks! However, I'm now getting a new error: ValueError: Precomputed metric requires shape (n_queries, n_indexed). Got (10487, 2) for 10487 indexed. – the_bonze Apr 25 '17 at 12:39
  • 1
    remove `metric="precomputed"` – Abhishek Thakur Apr 25 '17 at 12:52
  • Seems to work - thank you! How does removing "precomputed" affect the outcome? Apologies for the n00b questions; this is all new to me. Thank you for your help :) – the_bonze Apr 25 '17 at 12:53
  • From docs: If metric is “precomputed”, X is assumed to be a distance matrix and must be square. X may be a sparse matrix, in which case only “nonzero” elements may be considered neighbors for DBSCAN. – Abhishek Thakur Apr 25 '17 at 12:59
2

You have to call it like

db=DBSCAN(eps=0.2, min_samples=1, metric="precomputed").fit(mymatrix) 

(please notice the fit() function)

Leo Martins
  • 285
  • 2
  • 6