There are several ways to approach this, but let's start from a known point:
dat <- data.frame(
hour = c("5:00:00", "6:00:00", "7:00:00"),
attraction = c(1, 3, 6)
)
dat$hour <- as.POSIXct(dat$hour, format = "%H:%M:%S")
dat
# hour attraction
# 1 2020-01-12 05:00:00 1
# 2 2020-01-12 06:00:00 3
# 3 2020-01-12 07:00:00 6
Since you're looking to do time-based calcs, I set hour
as a POSIXt
type. (If you have a "date" component in your data as well, you'll want to include that in the conversion, but if this is always in the same day, then it does not appear to really matter.)
From here, we can introduce random minutes for each arrival:
set.seed(42)
dat2 <- do.call(
"rbind.data.frame",
Map(function(hr, n) data.frame(hour = hr, min = round(runif(n, min = 0, max = 59))),
dat$hour, dat$attraction)
)
dat2
# hour min
# 1 2020-01-12 05:00:00 54
# 2 2020-01-12 06:00:00 55
# 3 2020-01-12 06:00:00 17
# 4 2020-01-12 06:00:00 49
# 5 2020-01-12 07:00:00 38
# 6 2020-01-12 07:00:00 31
# 7 2020-01-12 07:00:00 43
# 8 2020-01-12 07:00:00 8
# 9 2020-01-12 07:00:00 39
# 10 2020-01-12 07:00:00 42
I don't know if you need the minute separately or as a real time, so perhaps
dat2$arrival_time <- dat2$hour + (60 * dat2$min)
dat2
# hour min arrival_time
# 1 2020-01-12 05:00:00 54 2020-01-12 05:54:00
# 2 2020-01-12 06:00:00 55 2020-01-12 06:55:00
# 3 2020-01-12 06:00:00 17 2020-01-12 06:17:00
# 4 2020-01-12 06:00:00 49 2020-01-12 06:49:00
# 5 2020-01-12 07:00:00 38 2020-01-12 07:38:00
# 6 2020-01-12 07:00:00 31 2020-01-12 07:31:00
# 7 2020-01-12 07:00:00 43 2020-01-12 07:43:00
# 8 2020-01-12 07:00:00 8 2020-01-12 07:08:00
# 9 2020-01-12 07:00:00 39 2020-01-12 07:39:00
# 10 2020-01-12 07:00:00 42 2020-01-12 07:42:00
I should note that your use of rnorm
"can" result in negative minutes, since it is asymptotically infinite; using sd=10
reduces the likelihood, certainly, but if you need the random arrival time to "always" be within the specified hour, then either your use of runif
is better or you might consider a truncated-normal distribution such as provided by the truncnorm
package.
Note: I use Map
, which is a multi-parameter version of lapply
. There are often advantages (sometimes in performance, sometimes readability) to using functions from R's apply
family, and while the performance benefits have mostly been mitigated (historically for
was often slower than sapply
), some still find *apply
better. In the case of Map
, I've written a few answers explaining (by "unrolling" it) how it works: https://stackoverflow.com/a/57367292 and https://stackoverflow.com/a/54485425.
To get occupancy-rates (how many cars in a given period), I suggest you use cut
to bin the arrival times. We can create bin boundaries with something like:
myseq <- round(range(dat2$arrival_time) + c(-1800,1800), "hour")
myseq
# [1] "2020-01-12 05:00:00 PST" "2020-01-12 08:00:00 PST"
myseq <- seq.POSIXt(myseq[1], myseq[2], by = "min")
length(myseq)
# [1] 181
myseq <- myseq[seq_along(myseq) %% 10 == 1]
myseq
# [1] "2020-01-12 05:00:00 PST" "2020-01-12 05:10:00 PST" "2020-01-12 05:20:00 PST"
# [4] "2020-01-12 05:30:00 PST" "2020-01-12 05:40:00 PST" "2020-01-12 05:50:00 PST"
# [7] "2020-01-12 06:00:00 PST" "2020-01-12 06:10:00 PST" "2020-01-12 06:20:00 PST"
# [10] "2020-01-12 06:30:00 PST" "2020-01-12 06:40:00 PST" "2020-01-12 06:50:00 PST"
# [13] "2020-01-12 07:00:00 PST" "2020-01-12 07:10:00 PST" "2020-01-12 07:20:00 PST"
# [16] "2020-01-12 07:30:00 PST" "2020-01-12 07:40:00 PST" "2020-01-12 07:50:00 PST"
# [19] "2020-01-12 08:00:00 PST"
The first command finds the range of times and rounds it out to the next hour. (The use of +c(-1800,1800)
ensures that the round will give us a floor and ceiling, respectively. This might find corner cases that are imperfect, but it should work most of the time.) The second command creates a per-minute sequence, 181 long here (three hours). The third command cuts this to just one every 10 minutes.
You should be able to easily adjust these three commands to your needs.
From here, you can use
cut(dat2$arrival_time, myseq)
# [1] 2020-01-12 05:50:00 2020-01-12 06:50:00 2020-01-12 06:10:00 2020-01-12 06:40:00
# [5] 2020-01-12 07:30:00 2020-01-12 07:30:00 2020-01-12 07:40:00 2020-01-12 07:00:00
# [9] 2020-01-12 07:30:00 2020-01-12 07:40:00
# 18 Levels: 2020-01-12 05:00:00 2020-01-12 05:10:00 2020-01-12 05:20:00 ... 2020-01-12 07:50:00
which gives you which 10-minute bin each arrival belongs to. A quick summary can be done with
table(cut(dat2$arrival_time, myseq))
# 2020-01-12 05:00:00 2020-01-12 05:10:00 2020-01-12 05:20:00 2020-01-12 05:30:00
# 0 0 0 0
# 2020-01-12 05:40:00 2020-01-12 05:50:00 2020-01-12 06:00:00 2020-01-12 06:10:00
# 0 1 0 1
# 2020-01-12 06:20:00 2020-01-12 06:30:00 2020-01-12 06:40:00 2020-01-12 06:50:00
# 0 0 1 1
# 2020-01-12 07:00:00 2020-01-12 07:10:00 2020-01-12 07:20:00 2020-01-12 07:30:00
# 1 0 0 3
# 2020-01-12 07:40:00 2020-01-12 07:50:00
# 2 0