2

The data frame currently looks like this

# A tibble: 20 x 3
   Badge `Effective Date`    day_off
   <dbl> <dttm>                <int>
 1  3162 2013-01-16 00:00:00       1
 2  3162 2013-01-19 00:00:00       2
 3  3162 2013-02-21 00:00:00       3
 5  3585 2015-10-21 00:00:00       5
 6  3586 2014-05-21 00:00:00       6
 7  3586 2014-05-23 00:00:00       7

I would like to create a new row for each day for each badge number between each effective date so that it looks something like this. The data frame is huge, so some tidy verse functions like complete which are resource intensive won't work.

# A tibble: 20 x 3
   Badge `Effective Date`    day_off
   <dbl> <dttm>                <int>
 1  3162 2013-01-16 00:00:00       1
 2  3162 2013-01-17 00:00:00.      1
 3  3162 2013-01-18 00:00:00.      1
 4  3162 2013-01-19 00:00:00       2
 5  3162 2013-01-20 00:00:00       2
 6  3162 2013-01-21 00:00:00       3
 7  3585 2015-10-21 00:00:00       5
 8  3586 2014-05-21 00:00:00       6
 9  3586 2014-05-22 00:00:00       6
 10 3586 2014-05-23 00:00:00       7

sd3184
  • 69
  • 4

2 Answers2

3

You can create a table where, for each Badge group, you have a sequence of datetimes from the first to the last. Then doing a rolling join to this data frame gives the desired output

library(data.table)

## Create reproducible example as an R object 
# Please do this yourself next time using dput. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

df <- fread('
   Badge , Effective_Date      , day_off
  3162 , 2013-01-16 00:00:00 ,      1
  3162 , 2013-01-19 00:00:00 ,      2
  3162 , 2013-01-21 00:00:00 ,      3
  3585 , 2015-10-21 00:00:00 ,      5
  3586 , 2014-05-21 00:00:00 ,      6
  3586 , 2014-05-23 00:00:00 ,      7
 ')

df[, Effective_Date := as.POSIXct(Effective_Date)]

## Rolling join

setDT(df) # required if data wasn't originally a data.table as above

df[df[, .(Effective_Date = seq(min(Effective_Date), max(Effective_Date), by = '1 day')),
       by = .(Badge)],
   on = .(Badge, Effective_Date), roll = TRUE]
#>     Badge Effective_Date day_off
#>  1:  3162     2013-01-16       1
#>  2:  3162     2013-01-17       1
#>  3:  3162     2013-01-18       1
#>  4:  3162     2013-01-19       2
#>  5:  3162     2013-01-20       2
#>  6:  3162     2013-01-21       3
#>  7:  3585     2015-10-21       5
#>  8:  3586     2014-05-21       6
#>  9:  3586     2014-05-22       6
#> 10:  3586     2014-05-23       7

Created on 2021-07-16 by the reprex package (v2.0.0)

IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
1

A tidyverse way would be using complete and fill -

library(dplyr)
library(tidyr)

df %>%
  group_by(Badge) %>%
  complete(Effective_Date = seq(min(Effective_Date), 
                                max(Effective_Date), by = '1 day')) %>%
  fill(day_off) %>%
  ungroup

#    Badge Effective_Date      day_off
#   <int> <dttm>                <int>
# 1  3162 2013-01-16 00:00:00       1
# 2  3162 2013-01-17 00:00:00       1
# 3  3162 2013-01-18 00:00:00       1
# 4  3162 2013-01-19 00:00:00       2
# 5  3162 2013-01-20 00:00:00       2
# 6  3162 2013-01-21 00:00:00       3
# 7  3585 2015-10-21 00:00:00       5
# 8  3586 2014-05-21 00:00:00       6
# 9  3586 2014-05-22 00:00:00       6
#10  3586 2014-05-23 00:00:00       7
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213