Questions tagged [dbscan]

DBSCAN means density-based spatial clustering of applications with noise and is a popular density-based cluster analysis algorithm.

It is a density-based clustering algorithm because it finds a number of clusters starting from the estimated density distribution of corresponding nodes. DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature. OPTICS can be seen as a generalization of DBSCAN to multiple ranges, effectively replacing the ε parameter with a maximum search radius.

See also wikipedia.

In scientific software for statistical computing and graphics, package dbscan implements this method.

563 questions
48
votes
9 answers

scikit-learn: Predicting new points with DBSCAN

I am using DBSCAN to cluster some data using Scikit-Learn (Python 2.7): from sklearn.cluster import DBSCAN dbscan = DBSCAN(random_state=0) dbscan.fit(X) However, I found that there was no built-in function (aside from "fit_predict") that could…
slaw
  • 6,591
  • 16
  • 56
  • 109
38
votes
5 answers

scikit-learn DBSCAN memory usage

UPDATED: In the end, the solution I opted to use for clustering my large dataset was one suggested by Anony-Mousse below. That is, using ELKI's DBSCAN implimentation to do my clustering rather than scikit-learn's. It can be run from the command line…
JamesT
  • 417
  • 2
  • 6
  • 8
38
votes
6 answers

Choosing eps and minpts for DBSCAN (R)?

I've been searching for an answer for this question for quite a while, so I'm hoping someone can help me. I'm using dbscan from the fpc library in R. For example, I am looking at the USArrests data set and am using dbscan on it as…
Belinda Chiera
  • 417
  • 1
  • 5
  • 7
32
votes
5 answers

DBSCAN for clustering of geographic location data

I have a dataframe with latitude and longitude pairs. Here is my dataframe look like. order_lat order_long 0 19.111841 72.910729 1 19.111342 72.908387 2 19.111342 72.908387 3 19.137815 72.914085 4 19.119677 72.905081 5 …
Neil
  • 7,937
  • 22
  • 87
  • 145
22
votes
2 answers

scikit-learn: clustering text documents using DBSCAN

I'm tryin to use scikit-learn to cluster text documents. On the whole, I find my way around, but I have my problems with specific issues. Most of the examples I found illustrate clustering using scikit-learn with k-means as clustering algorithm.…
22
votes
2 answers

DBSCAN in scikit-learn of Python: save the cluster points in an array

following the example Demo of DBSCAN clustering algorithm of Scikit Learning i am trying to store in an array the x, y of each clustering class import numpy as np from sklearn.cluster import DBSCAN from sklearn import metrics from…
Gianni Spear
  • 7,033
  • 22
  • 82
  • 131
17
votes
2 answers

dbscan - setting limit on maximum cluster span

By my understanding of DBSCAN, it's possible for you to specify an epsilon of, say, 100 meters and — because DBSCAN takes into account density-reachability and not direct density-reachability when finding clusters — end up with a cluster in which…
user139014
  • 1,445
  • 2
  • 19
  • 33
13
votes
4 answers

Python Clustering Algorithms

I've been looking around scipy and sklearn for clustering algorithms for a particular problem I have. I need some way of characterizing a population of N particles into k groups, where k is not necessarily know, and in addition to this, no a priori…
astromax
  • 6,001
  • 10
  • 36
  • 47
12
votes
4 answers

DBSCAN on spark : which implementation

I would like to do some DBSCAN on Spark. I have currently found 2 implementations: https://github.com/irvingc/dbscan-on-spark https://github.com/alitouka/spark_dbscan I have tested the first one with the sbt configuration given in its github but:…
Benjamin
  • 3,350
  • 4
  • 24
  • 49
11
votes
4 answers

Python: DBSCAN in 3 dimensional space

I have been searching around for an implementation of DBSCAN for 3 dimensional points without much luck. Does anyone know I library that handles this or has any experience with doing this? I am assuming that the DBSCAN algorithm can handle 3…
user2909415
  • 979
  • 3
  • 10
  • 26
11
votes
2 answers

Estimating/Choosing optimal Hyperparameters for DBSCAN

I need to find naturally occurring classes of nouns based on their distribution with different preposition (like agentive, instrumental, time, place etc.). I tried using k-means clustering but of less help, it didn't work well, there was a lot of…
Riyaz
  • 1,430
  • 2
  • 17
  • 27
10
votes
2 answers

How to get the centroids in DBSCAN sklearn?

I am using DBSCAN for clustering. However, now I want to pick a point from each cluster that represents it, but I realized that DBSCAN does not have centroids as in kmeans. However, I observed that DBSCAN has something called core points. I am…
EmJ
  • 4,398
  • 9
  • 44
  • 105
10
votes
6 answers

What are some packages that implement semi-supervised (constrained) clustering?

I want to run some experiments on semi-supervised (constrained) clustering, in particular with background knowledge provided as instance level pairwise constraints (Must-Link or Cannot-Link constraints). I would like to know if there are any good…
user1271286
  • 333
  • 5
  • 14
10
votes
4 answers

In scikit-learn, can DBSCAN use sparse matrix?

I got Memory Error when I was running dbscan algorithm of scikit. My data is about 20000*10000, it's a binary matrix. (Maybe it's not suitable to use DBSCAN with such a matrix. I'm a beginner of machine learning. I just want to find a cluster method…
9
votes
3 answers

How to cluster an instance with Weka's DBSCAN?

I've been trying to use the DBSCAN clusterer from Weka to cluster instances. From what I understand I should be using the clusterInstance() method for this, but to my surprise, when taking a look at the code of that method, it looks like the…
Oak
  • 26,231
  • 8
  • 93
  • 152
1
2 3
37 38