I am trying to apply a clustering algorithm by using R. I read a basic introduction for applying dbscan in R as well. My data is start/finish locations and times (more than 50k rows).
This is what the sample looks like:
# A tibble: 10 x 6
start_location_Long start_location_Lat end_location_Long end_location_Lat start_time1_cos end_time1_cos
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 101. 13.9 101. 13.9 -0.978 -0.998
2 101. 13.9 101. 13.8 -0.465 0.503
3 101. 13.9 101. 13.9 -0.756 -0.982
4 101. 13.8 101. 13.8 -0.827 -0.773
5 101. 13.8 101. 13.8 -0.956 -0.949
6 101. 13.8 101. 13.8 -0.969 -0.961
7 101. 13.8 101. 13.8 -0.946 -0.521
8 101. 13.8 101. 13.7 -0.972 -0.910
9 101. 13.7 101. 13.7 -0.840 -0.837
10 101. 13.8 101. 13.7 -0.497 -0.313
data <- structure(list(start_location_Long = c(100.60066, 100.60039,100.56864, 100.59018, 100.55926, 100.61014, 100.61504, 100.75646,100.56093, 100.52679), start_location_Lat = c(13.91761, 13.91746,13.88542, 13.7969, 13.83207, 13.82256, 13.80237, 13.82296, 13.73084,13.76592), end_location_Long = c(100.59982, 100.53864, 100.57354,100.59309, 100.56502, 100.56652, 100.65582, 100.73325, 100.56094,100.53465), end_location_Lat = c(13.91616, 13.8288, 13.86449,13.84172, 13.82841, 13.82762, 13.82176, 13.72228, 13.73224, 13.74595), start_time1_cos = c(-0.977783236758606, -0.464584475495966,-0.756281834105734, -0.827489114105152, -0.955963918764982, -0.968565073328525,-0.946485086708269, -0.971772589428584, -0.839856789165117, -0.497478722371776), end_time1_cos = c(-0.998416312411851, 0.502642787734849, -0.98199994355324,-0.772641247513493, -0.949334100771872, -0.960940326679488, -0.521319957219796,-0.910443172287846, -0.837480354951308, -0.313301931309727)), row.names = c(NA,-10L), class = c("tbl_df", "tbl", "data.frame"))
Based on this posted Choosing eps and minpts for DBSCAN (R)? I scaled my data and tried to use minpts as 4 and find eps from KNN distances.
However, my clustering results always merge together into 1 group even I tried to change minpts and eps many times.
Therefore, anyone who has experienced using dbscan algorithm please help me. How to cluster it? Because my data is very large and the simple data maybe not help so I also provided the raw data here
Thank you in advance.