Fill missing dates with previous value by group

Question

Let's say I have the following data.table

library(data.table)
set.seed(123)
df <- as.data.table(data.frame(date = c("2017-01-01", "2017-01-05", "2017-01-08", "2017-01-01", "2017-01-05", "2017-01-08"), 
                 value = rnorm(6),
                 mygroup = rep(LETTERS[1:2], each = 3)))

I want to fill in the missing dates with the 'last' value by group. The closest I have found was this question which shows how to do it without grouping.

all_dates <- seq(from = as.Date("2017-01-01"), 
                   to = as.Date("2017-01-08"), 
                   by = "days")

df[J(all_dates), roll=Inf]

However, I need to do this by group and using the by results in an error

Error in [.data.table(df, J(all_dates), roll = Inf, by = mygroup) : 'by' or 'keyby' is supplied but not j

Please do `df[, date := as.Date(date)]` rather than requiring as.Date be typed innumerable times. Anyway, I think `df[df[, .(date = seq(first(date), last(date), by="day")), by=mygroup], on=.(mygroup, date), roll=-Inf]` might do it..? — Frank, Aug 09 '17 at 20:23
Is it intended that each group can have a different range of dates or will it be the same for each (Jan 1-8 for this example)? In the latter case, there are some almost-dupes using CJ, like https://stackoverflow.com/a/10473931/ — Frank, Aug 09 '17 at 20:34
@Frank each group could have different ranges. Your initial suggestion currently results in an error `'on' argument should be a named atomic vector oc column names indicating which columns in 'i' should be joined with which columns in 'x'.` — cdeterman, Aug 09 '17 at 20:46
You see an error when running this in a new R session? `library(data.table); set.seed(123); df <- as.data.table(data.frame(date = c("2017-01-01", "2017-01-05", "2017-01-08", "2017-01-01", "2017-01-05", "2017-01-08"), value = rnorm(6), mygroup = rep(LETTERS[1:2], each = 3))); df[, date := as.Date(date)]; df[df[, .(date = seq(first(date), last(date), by="day")), by=mygroup], on=.(mygroup, date), roll=-Inf]` — Frank, Aug 09 '17 at 20:49
@Frank I just tried on another machine and it works. Apparently a version issue. The 1.9.6 version of `data.table` cannot do that command but the 1.10.4 version does work. — cdeterman, Aug 09 '17 at 20:52
Oh ok. I think `on = .(...)` notation was introduced later (instead of `on=c("...")`. Anyway, I can't find any dupe, so posting now. — Frank, Aug 09 '17 at 20:52

Frank · Accepted Answer · 2017-08-10T13:09:25.660

We can add mygroup as another column in the rolling join:

df[, date := as.Date(date)]

df[
  df[, .(date = seq(first(date), last(date), by="day")), by=mygroup], 
  on=.(mygroup, date), 
  roll=TRUE]

          date       value mygroup
 1: 2017-01-01 -0.56047565       A
 2: 2017-01-02 -0.56047565       A
 3: 2017-01-03 -0.56047565       A
 4: 2017-01-04 -0.56047565       A
 5: 2017-01-05 -0.23017749       A
 6: 2017-01-06 -0.23017749       A
 7: 2017-01-07 -0.23017749       A
 8: 2017-01-08  1.55870831       A
 9: 2017-01-01  0.07050839       B
10: 2017-01-02  0.07050839       B
11: 2017-01-03  0.07050839       B
12: 2017-01-04  0.07050839       B
13: 2017-01-05  0.12928774       B
14: 2017-01-06  0.12928774       B
15: 2017-01-07  0.12928774       B
16: 2017-01-08  1.71506499       B

The "rolling" always happens on the final column in on=.

If the table had more columns and we only wanted to fill back some of them...

# extend example
set.seed(1)
df[, y := rpois(.N, 1)]

# build new table
newDT = df[, .(date = seq(first(date), last(date), by="day")), by=mygroup]

roll_cols = "value"
newDT[, (roll_cols) := 
  df[newDT, on=.(mygroup, date), roll=TRUE, mget(paste0("x.", roll_cols))]]

noroll_cols = "y"
newDT[df, on=.(mygroup, date), (noroll_cols) := mget(paste0("i.", noroll_cols)) ]

    mygroup       date       value  y
 1:       A 2017-01-01 -0.56047565  0
 2:       A 2017-01-02 -0.56047565 NA
 3:       A 2017-01-03 -0.56047565 NA
 4:       A 2017-01-04 -0.56047565 NA
 5:       A 2017-01-05 -0.23017749  1
 6:       A 2017-01-06 -0.23017749 NA
 7:       A 2017-01-07 -0.23017749 NA
 8:       A 2017-01-08  1.55870831  1
 9:       B 2017-01-01  0.07050839  2
10:       B 2017-01-02  0.07050839 NA
11:       B 2017-01-03  0.07050839 NA
12:       B 2017-01-04  0.07050839 NA
13:       B 2017-01-05  0.12928774  0
14:       B 2017-01-06  0.12928774 NA
15:       B 2017-01-07  0.12928774 NA
16:       B 2017-01-08  1.71506499  2

This is very close, but I want to have it fill from the prior value (e.g. the `2017-01-02` `value` should be `-0.56047565`, — cdeterman, Aug 10 '17 at 12:59
Ah doh, I misread. Switching to `roll=TRUE` should do that. I'll fix. Helps to see desired output generally, fyi. — Frank, Aug 10 '17 at 13:07

Fill missing dates with previous value by group

1 Answers1