0

I'm working with a simple dataframe in R containing two columns that represent a time interval:

  • Started (Date/Time)
  • Ended (Date/Time)

I want to create a column containing the duration of these time intervals where I can then group by date. The issue is some of the intervals cross midnight and thus have time durations associated with two different dates. Rather than arbitrarily grouping these by their start/end dates I'd like to find a way to include times prior to midnight in one date group and those after midnight in the next day's group.

My current approach seems inefficient, plus I'm hitting a roadblock. First I reformatted the df and created a blank column to hold duration, plus another to hold a "new end date" for performing interval operations:

  • Start.Date
  • Start.Time
  • End.Date
  • End.Time
  • Duration
  • End.Date.New

I then used a loop to find instances where the time crossed midnight to store the last second of that day 23:59:59 in the End.Date.New column"

for(i in 1:nrow(df)) {
  if(df$End.Time[i] < df$Start.Time[i]) {
    df$End.Time.New[i] = '23:59:59'}}

The idea would be that, for instances where End.Time.New != NA, I could calculate Duration using Start.Time and End.Time.New and use Start.Date as my group-by variable. I would then have to generate an identical row that added 1 day to the start time and perform a similar operation (End.Date and 00:00:00) to populate the duration column, and I haven't been able to figure out how to make this work.

Is this separate-and-loop approach the best way to achieve this or is there a more efficient strategy using functions I may not be aware of?

r_joseph
  • 11
  • 1
  • Please make this question *reproducible*. This includes sample *unambiguous* data (e.g., `dput(head(x))` or `data.frame(x=...,y=...)`) and intended output given that input. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. – r2evans Sep 21 '20 at 21:01
  • As a starting point, many programming languages (e.g., `SQL`) will infer that `'23:59:59'` is a timestamp and should be coerced accordingly. R is not one of them. While `>` does work with *strings* -- and it might be reliable enough when you always have 24h time formats -- it is often better (from a data science perspective) to convert your timestamps to something more *numeric*-centric, such as `POSIXt` (base R) or some time-only objects afforded by the `zoo`, `lubridate`, and (I think) `data.table` packages. – r2evans Sep 21 '20 at 21:03

0 Answers0