3

I tried to apply HDBSCAN algorithm to my dataset (50000 GPS points). However, every time I run the code, the R session is crashed.

Here is the basic info. about my PC:

processor: Intel i7 7820x 3.6 GHz
memory: 120 GB
System: 64-bit Operating system, x64-based processor

Here is the subset of my dataframe (df):

 Hour      lon    lat
   19:49:19 -73.97868 40.76272
   03:07:49 -74.00217 40.73429
   00:53:36 -74.00869 40.73819
   16:51:35 -73.94724 40.77943
   20:12:39 -73.86382 40.76952
   13:20:07 -74.00842 40.74652
   21:52:18 -74.00845 40.72110
   02:08:07 -73.93993 40.70765
   19:47:01 -73.98917 40.72040
   18:55:11 -74.00297 40.76039
   22:30:02 -73.97443 40.74751
   15:29:26 -73.96956 40.76112
   22:44:05 -73.97282 40.75642
   07:57:17 -73.99771 40.73627
   19:33:36 -73.95992 40.77361

and here is my HDBSCAN Codes:

cl <- hdbscan(df[,2:3], minPts = 0.01 * 50000) # I want to keep the minpt = 1% of my total number of points


plot(df[,2:3], col=cl$cluster+1, pch=20) # plot the results

I tried to reduce the number of points from the original dataset:


df1 <- sample_n(df,45000)
cl <- hdbscan(df1[,2:3], minPts = 0.01 * 45000) 


plot(df1[,2:3], col=cl$cluster+1, pch=20) 

This works fine.

I find that once the total number of points exceeding 50000 the R begins to crash. Any solution for this? thanks

Yunzhe Liu
  • 93
  • 5

1 Answers1

1

This likely is not a usage failure, but a programming failure in the module.

It's fairly common to see an 32 bit integer overflow at this size, because 50000² cannot be stored in signed 32 bit. A typical cutoff is around 46341. Any chance that 46342 is the first size to fail? So you would likely need to rewrite that module to use 64 bit counters. And of course, the overflow needs to be detected properly. So you should properly file a bug report, not ask in a Q&A forum like this.

As a workaround, you can try the HDBSCAN* implementations for python and ELKI if they scale better. It shouldn't be necessary to use 32 bit mateixes. Nevertheless, go report the bug!

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194