1

I have a dataframe containing two different columns:

StartTime EndTime
8:00 2:00
8:00 12:00
8:00 4:00
2:00 6:00
12:00 6:00

I want to create another column "SelectedTime" that is a randomly selected time on half hour increments (1:00, 1:30, 2:00, 2:30) with an hour buffer between StartTime and EndTime. So for row one it would be a random time between 9:00 and 1:00. Is this possible?

Ryan Gary
  • 71
  • 7
  • 1
    Do you use the 12-hour clock time? 2:00 in the row 1 means 14:00 afternoon? Is EndTime always greater than StartTime? – Darren Tsai Jul 20 '23 at 17:33

2 Answers2

1

If we make two simplifying assumptions---first, that the times are integers in 24 hour format and second, that no row spans into the next day---then the following may work.

randTime <- function(DF) {
  # Double each value. This way, we can sample integers and then divide by two
  # to get "on the half-hour" values.
  
  newDF <- DF * 2
  # Next, figure out how many possible half hours choices there are. That should
  # be equal to EndTime - StartTime - 4 where the last 4 is for the hour buffer
  # on either side. Store this in a vector
  validTimes <- newDF$EndTime - newDF$StartTime - 4
  
  # Now pick a random number from 1 to validTimes for each entry
  randHalf <- vapply(validTimes, sample, integer(1L), size = 1L)
  (randHalf + newDF$StartTime) / 2
}

Using your initial data set, we get:

DF <- data.frame(StartTime = c(8, 8, 8, 14, 12),
                 EndTime = c(14, 12, 16, 18, 18))
> DF
  StartTime EndTime
1         8      14
2         8      12
3         8      16
4        14      18
5        12      18
set.seed(278L)
> randTime(DF)
[1]  9.0  8.5 10.5 15.0 13.5
> randTime(DF)
[1]  8.5  8.5 12.5 16.0 14.0

Extra steps to format the output as times or convert the input from characters should not be too difficult.

Avraham
  • 1,655
  • 19
  • 32
0

Using the accepted answer from this post:

df%>%
  mutate(across(everything(),\(x) as.POSIXct(x, format="%H:%M"))) %>%
  mutate(EndTime=if_else(StartTime>=EndTime, EndTime+ as.difftime(12, units="hours"),EndTime)) %>%
  rowwise()%>%
  mutate(SelectedTime=sample(seq(StartTime+ as.difftime(1, units="hours"),
                                  by=1800,
                                  EndTime-+ as.difftime(1, units="hours")),1)) %>%
  ungroup()

# A tibble: 5 × 3
  StartTime           EndTime             SelectedTime       
  <dttm>              <dttm>              <dttm>             
1 2023-07-20 08:00:00 2023-07-20 14:00:00 2023-07-20 11:00:00
2 2023-07-20 08:00:00 2023-07-20 12:00:00 2023-07-20 11:00:00
3 2023-07-20 08:00:00 2023-07-20 16:00:00 2023-07-20 14:00:00
4 2023-07-20 02:00:00 2023-07-20 06:00:00 2023-07-20 03:00:00
5 2023-07-20 12:00:00 2023-07-20 18:00:00 2023-07-20 16:00:00

Explanation:

  • First mutate converts time from character type to POSIXct
  • Second mutate address the issue of EndTime is smaller than StartTime (e.g. the third row)
  • Use sample() +seq() to randomly sample in half an hour interval
one
  • 3,121
  • 1
  • 4
  • 24