1

I have a data table with 3 columns, (Start, Stop, & Type). Some of the original datetimes hand off from Stop to Start smoothely, but others have gaps. I want to create new rows with a Start datetime, End datetime, & Type = 0 that fills the gaps if needed. Below is some sample data...

What I have...

LOG_START_DT    LOG_END_DT      Type
3/28/2018 9:30  3/28/2018 12:15 2
3/28/2018 13:30 3/28/2018 16:30 1
3/28/2018 17:15 3/28/2018 20:00 2
3/28/2018 21:15 3/29/2018 0:00  2
3/29/2018 0:00  3/29/2018 0:30  2
3/29/2018 1:30  3/29/2018 5:00  1

What I want...

LOG_START_DT    LOG_END_DT      Type
3/28/2018 9:30  3/28/2018 12:15 2
3/28/2018 12:16 3/28/2018 13:29 0
3/28/2018 13:30 3/28/2018 16:30 1
3/28/2018 16:31 3/28/2018 17:14 0
3/28/2018 17:15 3/28/2018 20:00 2
3/28/2018 20:01 3/28/2018 21:14 0
3/28/2018 21:15 3/29/2018 0:00  2
3/29/2018 0:00  3/29/2018 0:30  2
3/29/2018 0:31  3/29/2018 1:29  0
3/29/2018 1:30  3/29/2018 5:00  1

Also, it's important to note that whatever rows are added do not have a time that overlaps with the previous end or next start date time. My original data is about 500 rows too which I've tried to do a combination for loops or if statements, but can't figure it out or it takes way too long to run through the data.

Thank you!

PVic
  • 417
  • 1
  • 4
  • 13
  • Hm, that doesn't look like a smooth hand-off either. You have one ending at 16:30 and the next starting at 16:29? Btw, it's not easy to read your example into R; you might want to look at https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 – Frank Apr 09 '18 at 16:40
  • Sharing data with `dput()` would be much friendlier. – Gregor Thomas Apr 09 '18 at 16:54

1 Answers1

1

Let's get the data and convert to datetimes.

library(tidyverse)
library(lubridate)

foo <- read_table("LOG_START_DT    LOG_END_DT      Type
3/28/2018 9:30  3/28/2018 12:15 2
3/28/2018 13:30 3/28/2018 16:30 1
3/28/2018 17:15 3/28/2018 20:00 2
3/28/2018 21:15 3/29/2018 0:00  2
3/29/2018 0:00  3/29/2018 0:30  2
3/29/2018 1:30  3/29/2018 5:00  1")


foo <- foo %>% 
  mutate(LOG_START_DT = mdy_hm(LOG_START_DT), LOG_END_DT = mdy_hm(LOG_END_DT))

Let's make an auxiliary data frame with the ends as starts and starts as ends, all with Type of 0.

bar <- data_frame(LOG_START_DT = foo$LOG_END_DT[-nrow(foo)],
                  LOG_END_DT = foo$LOG_START_DT[-1],
                  Type = 0L)
bar
#> # A tibble: 5 x 3
#>   LOG_START_DT        LOG_END_DT           Type
#>   <dttm>              <dttm>              <int>
#> 1 2018-03-28 12:15:00 2018-03-28 13:30:00     0
#> 2 2018-03-28 16:30:00 2018-03-28 17:15:00     0
#> 3 2018-03-28 20:00:00 2018-03-28 21:15:00     0
#> 4 2018-03-29 00:00:00 2018-03-29 00:00:00     0
#> 5 2018-03-29 00:30:00 2018-03-29 01:30:00     0

Then get rid of any rows that result from a "smooth hand-off" (which you do not define very well so I've defined it as "the next start is the same as the previous end"). After, (and this doesn't seem like a good idea but this gives you what you want) add a minute and subtract a minute from the two datetime columns.

bar <- bar %>% 
  filter(LOG_START_DT != LOG_END_DT) %>% 
  mutate(LOG_START_DT = LOG_START_DT + minutes(1),
     LOG_END_DT = LOG_END_DT - minutes(1))

I don't think the adjustment is a good idea because it seems to break things if the original start and end happen to be only one minute (or less) apart. But that's up to you.

Then just bind the two data frames together and sort it.

baz <- rbind(foo, bar) %>% 
  arrange(LOG_START_DT)
baz
#> # A tibble: 10 x 3
#>    LOG_START_DT        LOG_END_DT           Type
#>    <dttm>              <dttm>              <int>
#>  1 2018-03-28 09:30:00 2018-03-28 12:15:00     2
#>  2 2018-03-28 12:16:00 2018-03-28 13:29:00     0
#>  3 2018-03-28 13:30:00 2018-03-28 16:30:00     1
#>  4 2018-03-28 16:31:00 2018-03-28 17:14:00     0
#>  5 2018-03-28 17:15:00 2018-03-28 20:00:00     2
#>  6 2018-03-28 20:01:00 2018-03-28 21:14:00     0
#>  7 2018-03-28 21:15:00 2018-03-29 00:00:00     2
#>  8 2018-03-29 00:00:00 2018-03-29 00:30:00     2
#>  9 2018-03-29 00:31:00 2018-03-29 01:29:00     0
#> 10 2018-03-29 01:30:00 2018-03-29 05:00:00     1

And I suppose if you really wanted that awful date format back you could do this:

baz_FUGLY <- baz %>% 
  mutate_if(is.POSIXct, format, "%m/%d/%Y %H:%M")
ngm
  • 2,539
  • 8
  • 18