0

I've been recently studying DBSCAN with R for transit research purposes, and I'm hoping if someone could help me with this particular dataset.

Summary of my dataset is described below.

      BTIME ATIME
1029  20001 21249
2944  24832 25687
6876  25231 26179
11120 20364 21259
11428 25550 26398
12447 24208 25172

What I am trying to do is to cluster these data using BTIME as x axis, ATIME as y axis. A pair of BTIME and ATIME represents the boarding time and arrival time of a subway passenger.

For more explanation, I will add the scatter plot of my total data set.

Scatter plot of my data set

However if I split my dataset in different smaller time periods, the scatter plot looks like this. I would call this a sample dataset. Scatter plot in larger scale.

If I perform a DBSCAN clustering on the second image(sample data set), the clustering is performed as expected. enter image description here

However it seems that DBSCAN cannot perform cluster on the total dataset with smaller scales. Maybe because the data is too dense.

So my question is, Is there a way I can perform clustering in the total dataset? What criteria should be used to separate the time scale of the data

I think the total data set is highly dense, which was why I tried clustering on a sample time period.

If I seperate my total data into smaller time scale, how would I choose the hyperparameters for each seperated dataset? If I look at the data, the distribution of the data is similar both in the total dataset and the seperated sample dataset.

I would sincerely appreciate some advices.

Yun Hyunsoo
  • 71
  • 1
  • 8
  • 2
    Perhaps use a smaller `eps`? Also, please provide a snippet of your code along with minimal and [reproducible example(s)](https://stackoverflow.com/a/5963610/10802499). Use `dput()` for data and specify all non-base packages with `library()` calls. – ekoam Oct 28 '20 at 19:16
  • 2
    What's compelling you to use DBSCAN for identifying clusters? Based on your total dataset, it would be the last algorithm I would use to identify interesting populations. What's your end goal? – Dewey Brooke Nov 01 '20 at 00:19
  • Hi ! What do yoo do when you "split" your dataset ? And what does mean the intervals between your cluster ? The nights, for example ? If so, no need to use algorithms, just check the dates and hours ... – MrSmithGoesToWashington Nov 03 '20 at 09:22

0 Answers0