I have a dataset of points;
lat |long | time
34.53 -126.34 1
34.52 -126.32 2
34.51 -126.31 3
34.54 -126.36 4
34.59 -126.28 5
34.63 -126.14 6
34.70 -126.05 7
...
(Much larger dataset, but this is the general structure.)
I want to cluster points based on distance and time. DBSCAN seems like a good choice, since I don't know how many clusters there are.
I am using, currently, minute/5500 (which is approx 20 meters, scaled, I believe.)
library(fpc)
results<-dbscan(data,MinPts=3,eps=0.00045,method="raw",scale=FALSE,showplot=1)
I am having a problem understanding how the scaling / distance is determined, since I have raw data. I can guess at values for eps when scaled or unscaled, but I am unclear what the scaling does, or what distance metric is being used (Euclidean distance, perhaps?) Is there documentation on this somewhere?
(This is not about finding an automated way to choose, (like Choosing eps and minpts for DBSCAN (R)? ) but about what the different values mean. Saying "You need a distance function first" doesn't explain what the distance function being used is, or how to create one...)