I tried to apply HDBSCAN algorithm to my dataset (50000 GPS points). However, every time I run the code, the R session is crashed.
Here is the basic info. about my PC:
processor: Intel i7 7820x 3.6 GHz
memory: 120 GB
System: 64-bit Operating system, x64-based processor
Here is the subset of my dataframe (df):
Hour lon lat
19:49:19 -73.97868 40.76272
03:07:49 -74.00217 40.73429
00:53:36 -74.00869 40.73819
16:51:35 -73.94724 40.77943
20:12:39 -73.86382 40.76952
13:20:07 -74.00842 40.74652
21:52:18 -74.00845 40.72110
02:08:07 -73.93993 40.70765
19:47:01 -73.98917 40.72040
18:55:11 -74.00297 40.76039
22:30:02 -73.97443 40.74751
15:29:26 -73.96956 40.76112
22:44:05 -73.97282 40.75642
07:57:17 -73.99771 40.73627
19:33:36 -73.95992 40.77361
and here is my HDBSCAN Codes:
cl <- hdbscan(df[,2:3], minPts = 0.01 * 50000) # I want to keep the minpt = 1% of my total number of points
plot(df[,2:3], col=cl$cluster+1, pch=20) # plot the results
I tried to reduce the number of points from the original dataset:
df1 <- sample_n(df,45000)
cl <- hdbscan(df1[,2:3], minPts = 0.01 * 45000)
plot(df1[,2:3], col=cl$cluster+1, pch=20)
This works fine.
I find that once the total number of points exceeding 50000 the R begins to crash. Any solution for this? thanks