Questions tagged [hdbscan]

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996. It is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions.

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996.1 It is a density-based clustering algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away). DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature.

In 2014, the algorithm was awarded the test of time award (an award given to algorithms which have received substantial attention in theory and practice) at the leading data mining conference, KDD.

81 questions

votes

4 answers

how do I solve " Failed building wheel for hdbscan "?

I tried to download Hdbscan using pip install hdbscan , I get this : ERROR: Failed building wheel for hdbscan ERROR: Could not build wheels for hdbscan which use PEP 517 and cannot be installed directly I've tried several solutions, it didn't work…

python pip hdbscan

asked May 01 '21 at 03:39

Omar Hossam

votes

1 answer

HDBSCAN difference between parameters

I'm confused about the difference between the following parameters in HDBSCAN min_cluster_size min_samples cluster_selection_epsilon Correct me if I'm wrong. For min_samples, if it is set to 7, then clusters formed need to have 7 or more…

machine-learning scikit-learn cluster-analysis hierarchical-clustering hdbscan

asked Jun 09 '21 at 05:22

HR1

votes

1 answer

Issue with hdbscan (ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject)

I know a number of people have posted about this before but I still can't resolve my error. I'm trying to import hdbscan but it keeps returning the following…

python numpy jupyter-notebook hdbscan

asked Mar 17 '21 at 03:00

code_learner93

votes

3 answers

How to resolve ERROR: Could not build wheels for hdbscan, which is required to install pyproject.toml-based projects

I am trying to install bertopic and I got this error: pip install bertopic Collecting bertopic > Using cached bertopic-0.11.0-py2.py3-none-any.whl (76 kB) > Collecting hdbscan>=0.8.28 > Using cached…

python bert-language-model hdbscan

asked Jul 29 '22 at 22:16

SamanthaK

votes

1 answer

How do I use sklearn.metrics.pairwise pairwise_distances with callable metric?

I'm doing some behavior analysis where I track behaviors over time and then create n-grams of those behaviors. sample_n_gram_list = [['scratch', 'scratch', 'scratch', 'scratch', 'scratch'], ['scratch', 'scratch', 'scratch',…

python scikit-learn hdbscan

asked Dec 17 '18 at 04:26

not-bob

votes

1 answer

hdbscan error: TypeError: 'numpy.float64' object cannot be interpreted as an integer

I ran hdbscan function code both on Linux and google colab and got the same error TypeError: 'numpy.float64' object cannot be interpreted as an integer error seems to happen when applying data to the 'fit_predict' function code comes from hdbscan…

python-3.x scikit-learn cluster-analysis hdbscan

asked Jul 18 '23 at 15:25

Sotiris

votes

1 answer

DBSCAN or HDBSCAN is better option? and why?

which clustering method is considered to be the best among DBSCAN and HDBSCAN and what is the reason behind that?

cluster-analysis dbscan hdbscan

asked Nov 24 '20 at 05:39

Mahnaz Rafia Islam

votes

1 answer

Problems with HDBSCAN and approximate predict

I would like to use the HDBSCAN clustering technique to predict outliers. I have trained my model to optimize the parameters, but then, when I apply approximate_predict on new data, I get different clusters and labels that I have in my original…

python cluster-analysis prediction hdbscan

asked Mar 23 '20 at 14:49

Ariadna Fernández

votes

2 answers

What is the appropriate distance metric when clustering paragraph/doc2vec vectors?

My intent is to cluster document vectors from doc2vec using HDBSCAN. I want to find tiny clusters where there are semantical and textual duplicates. To do this I am using gensim to generate document vectors. The elements of the resulting docvecs are…

python cluster-analysis distance doc2vec hdbscan

asked Oct 09 '18 at 13:35

fluffet

votes

0 answers

Clustering with UMAP and HDBScan

I have a somewhat large amount of textual data, input by approximately 5000 people. I've assigned each person a vector using Doc2vec, reduced to two dimensions using UMAP and highlighted groups contained within using HDBSCAN. The intention is to…

python matplotlib nlp hdbscan runumap

asked Jul 15 '21 at 17:21

Jacob

votes

0 answers

Problem with hdbscan used with bertopic: OSError: [Errno 22] Invalid argument

I am writing because I have a problem (silly and obvious introduction, I know). I am trying to use the BERTopic package using the Python interpreter in RStudio and the reticulate extension: Python 3.6.13…

python r oserror hdbscan

asked Apr 29 '21 at 16:54

Francis

votes

1 answer

HDBSCAN for R Crashed with large dataset

I tried to apply HDBSCAN algorithm to my dataset (50000 GPS points). However, every time I run the code, the R session is crashed. Here is the basic info. about my PC: processor: Intel i7 7820x 3.6 GHz memory: 120 GB System: 64-bit Operating system,…

r gps dbscan hdbscan

asked May 19 '19 at 13:42

Yunzhe Liu

votes

1 answer

TypeError issue importing hdbscan

Python 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 17:59:51) [MSC v.1935 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import hdbscan Traceback (most recent call last): File…

python hdbscan

asked Jul 03 '23 at 01:01

Nathan Luo

votes

1 answer

Scikit HDBSCAN tree labeling (not single-slice labeling)

BLUF: For a specific epsilon (or for HDBSCAN's 'favorite' epsilon), I can extract the mapping of my data in that epsilon's partition. But how can I see my data's full tree membership? I've gotten a ton out of the terrific tutorial here. In scikit…

scikit-learn data-science cluster-analysis hierarchical-clustering hdbscan

asked Feb 21 '22 at 20:02

Sam Greenberg

votes

1 answer

HDBSCAN handling of large datasets

I am trying to implement a clustering on a large dataset consisting of 146,000 observations, using the HDBSCAN algorithm. When I cluster these observations with the (default) Minkowski/Euclidean distance measure, clustering of the entire data goes…

python cluster-analysis hdbscan

asked Nov 29 '21 at 12:57

statsguy96

2 3 4 5 6 Next