One approach of generating random timestamps in a range is by generating a sequence of all possible timestamp in the range by using seq
function, and then randomly select n timestamps from them by using sample
function. For example if you want to generate 3 random timestamps between Jan 1, 2021
and Jan 3, 2021
, in the unit of second
, you can do:
set.seed(1)
seq(as.POSIXct("2021-01-01 00:00:00") ,as.POSIXct("2021-01-03 23:59:59"), by = "s") |>
sample(3)
#[1] "2021-01-01 06:46:27 +07" "2021-01-03 04:56:32 +07"
#[3] "2021-01-02 10:33:32 +07"
Note: You can specify your own time zone by using tz
in as.POSIXct
function.
By this approach, you can get 3 million random timestamps by the following steps:
- Set the start and the end of the daily range to
18:00:00
and 23:59:59
, respectively.
starts <- seq(as.POSIXct("2019-01-01 18:00:00"), as.POSIXct("2021-01-01 18:00:00"),
by = "days")
ends <- seq(as.POSIXct("2019-01-01 23:59:59"), as.POSIXct("2021-01-01 23:59:59"),
by = "days")
- Calculate the number of samples for each day
ndays = length(starts)
n = 3e6/ndays
- Randomly select n samples from all possible timestamps on each day, and
the store the samples in a list.
sampled_timestamps <- vector("list", ndays)
for (k in 1:ndays) {
sampled_timestamps[[k]] <- seq(starts[k], ends[k], by = "hours") |>
sample(n)
}
- Convert the
sampled_timestamps
to a vector to be able to use it as a column in a data frame.
v_sampled_timestamps <- do.call("c", sampled_timestamps)
Now you can use v_sampled_timestamps
to fill in the values of the timestamps
column in your data frame.
your_df$timestamps <- v_sampled_timestamps