1

Let say, we have the following

library(data.table); library(zoo)
dt <- data.table(grp = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3), period = seq.Date(from = as.Date('2014-01-01'), to = as.Date('2014-05-01'), by = 'month'), x=c(1:15), y=c(11:25))
dt[, period:=as.yearmon(period, '%Y-%m-%d')]

return,

    grp   period  x  y
 1:   1 Jan 2014  1 11
 2:   1 Feb 2014  2 12
 3:   1 Mar 2014  3 13
 4:   1 Apr 2014  4 14
 5:   1 May 2014  5 15
 6:   2 Jan 2014  6 16
 7:   2 Feb 2014  7 17
 8:   2 Mar 2014  8 18
 9:   2 Apr 2014  9 19
10:   2 May 2014 10 20
11:   3 Jan 2014 11 21
12:   3 Feb 2014 12 22
13:   3 Mar 2014 13 23
14:   3 Apr 2014 14 24
15:   3 May 2014 15 25

I want to update columns x and y using value related to March 2014. The return that I expect will be as follows:

    grp   period  x  y
 1:   1 Jan 2014  1 11
 2:   1 Feb 2014  2 12
 3:   1 Mar 2014  3 13
 4:   1 Apr 2014  3 13
 5:   1 May 2014  3 13
 6:   2 Jan 2014  6 16
 7:   2 Feb 2014  7 17
 8:   2 Mar 2014  8 18
 9:   2 Apr 2014  8 18
10:   2 May 2014  8 18
11:   3 Jan 2014 11 21
12:   3 Feb 2014 12 22
13:   3 Mar 2014 13 23
14:   3 Apr 2014 13 23
15:   3 May 2014 13 23

I have tried the following code, but it only uses the values from row 3.

dt[which(period > dt[3, period]),`:=`(x=dt[3, x], y = dt[3, y]), by=grp]

Could you please give suggestions?

newbie
  • 917
  • 8
  • 21
  • Possible duplicate of http://stackoverflow.com/questions/7735647/replacing-nas-with-latest-non-na-value – akrun Jan 04 '17 at 10:16
  • 1
    Maybe also `dt[dt[,tail(.I, -2),by=grp]$V1,\`:=\`(x=x[1], y = y[1]), by=grp]` (if `dt` is sorted) or `dt[period >= "Mar 2014",\`:=\`(x=x[1], y = y[1]), by=grp]`? – lukeA Jan 04 '17 at 10:50

3 Answers3

4

You could replace all x and y values after March 2014 with NA, then use na.locf():

dt[period > "March 2014",`:=`(x=NA,y=NA)][,`:=`(x=na.locf(x), y=na.locf(y))]
#    grp   period  x  y
# 1:   1 Jan 2014  1 11
# 2:   1 Feb 2014  2 12
# 3:   1 Mar 2014  3 13
# 4:   1 Apr 2014  3 13
# 5:   1 May 2014  3 13
# 6:   2 Jan 2014  6 16
# 7:   2 Feb 2014  7 17
# 8:   2 Mar 2014  8 18
# 9:   2 Apr 2014  8 18
#10:   2 May 2014  8 18
#11:   3 Jan 2014 11 21
#12:   3 Feb 2014 12 22
#13:   3 Mar 2014 13 23
#14:   3 Apr 2014 13 23
#15:   3 May 2014 13 23
mtoto
  • 23,919
  • 4
  • 58
  • 71
3

Looking at this again, I think this is a pretty clean way (assuming sorting):

cols = c("x", "y")
dt[period >= "Mar 2014", (cols) := .SD[1L], by=grp, .SDcols = cols]

Another way would be to use a rolling join:

dt[period >= "Mar 2014", c("x", "y") := 
  .SD[period == "Mar 2014"][.SD, on=.(grp, period), roll=TRUE, .(x.x, x.y)]
]

How the second option works

All of the following is covered in the main doc, reachable by typing ?data.table.

DT[i, (cols) := e] will overwrite cols in the rows selected by i.

Looking more carefully at e, we see .SD, which only works inside DT[i, ...]. We can take it out of DT[i, ...] is we substitute DT[i] for .SD. From there, we can simplify e to see how it works:

> mySD = DT[period >= "Mar 2014"]
> mySD
   grp   period  x  y
1:   1 Mar 2014  3 13
2:   1 Apr 2014  4 14
3:   1 May 2014  5 15
4:   2 Mar 2014  8 18
5:   2 Apr 2014  9 19
6:   2 May 2014 10 20
7:   3 Mar 2014 13 23
8:   3 Apr 2014 14 24
9:   3 May 2014 15 25
> mySD[period == "Mar 2014"]
   grp   period  x  y
1:   1 Mar 2014  3 13
2:   2 Mar 2014  8 18
3:   3 Mar 2014 13 23
> mySD[period == "Mar 2014"][mySD, on=.(grp, period)]
   grp   period  x  y i.x i.y
1:   1 Mar 2014  3 13   3  13
2:   1 Apr 2014 NA NA   4  14
3:   1 May 2014 NA NA   5  15
4:   2 Mar 2014  8 18   8  18
5:   2 Apr 2014 NA NA   9  19
6:   2 May 2014 NA NA  10  20
7:   3 Mar 2014 13 23  13  23
8:   3 Apr 2014 NA NA  14  24
9:   3 May 2014 NA NA  15  25
> mySD[period == "Mar 2014"][mySD, on=.(grp, period), roll=TRUE]
   grp   period  x  y i.x i.y
1:   1 Mar 2014  3 13   3  13
2:   1 Apr 2014  3 13   4  14
3:   1 May 2014  3 13   5  15
4:   2 Mar 2014  8 18   8  18
5:   2 Apr 2014  8 18   9  19
6:   2 May 2014  8 18  10  20
7:   3 Mar 2014 13 23  13  23
8:   3 Apr 2014 13 23  14  24
9:   3 May 2014 13 23  15  25
> mySD[period == "Mar 2014"][mySD, on=.(grp, period), roll=TRUE, .(x.x, x.y)]
   x.x x.y
1:   3  13
2:   3  13
3:   3  13
4:   8  18
5:   8  18
6:   8  18
7:  13  23
8:  13  23
9:  13  23
Frank
  • 66,179
  • 8
  • 96
  • 180
1

One option with dplyr. Filtering data for period greater than equal to Mar 2014 and assigning the x and y value for period Mar 2014 to all the rows grouped by grp.

library(dplyr)
dt[dt$period >= "Mar 2014"] <- dt %>%
                                 filter(period >= "Mar 2014") %>%
                                 group_by(grp) %>%
                                 mutate(x = x[period == "Mar 2014"],
                                        y = y[period == "Mar 2014"])

dt
#    grp   period  x  y
#1:   1 Jan 2014  1 11
#2:   1 Feb 2014  2 12
#3:   1 Mar 2014  3 13
#4:   1 Apr 2014  3 13
#5:   1 May 2014  3 13
#6:   2 Jan 2014  6 16
#7:   2 Feb 2014  7 17
#8:   2 Mar 2014  8 18
#9:   2 Apr 2014  8 18
#10:  2 May 2014  8 18
#11:  3 Jan 2014 11 21
#12:  3 Feb 2014 12 22
#13:  3 Mar 2014 13 23
#14:  3 Apr 2014 13 23
#15:  3 May 2014 13 23
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213