13

This is what my data table looks like:

library(data.table)
dt <- fread('
    Product  Group    LastProductOfPriorGroup
    A          1          NA
    B          1          NA
    C          2          B
    D          2          B
    E          2          B
    F          3          E
    G          3          E
')

The LastProductOfPriorGroup column is my desired column. I am trying to fetch the product from last row of the prior group. So in the first two rows, there are no prior groups and therefore it is NA. In the third row, the product in the last row of the prior group 1 is B. I am trying to accomplish this by

dt[,LastGroupProduct:= shift(Product,1), by=shift(Group,1)]

to no avail.

Uwe
  • 41,420
  • 11
  • 90
  • 134
gibbz00
  • 1,947
  • 1
  • 19
  • 31

2 Answers2

17

You could do

dt[, newcol := shift(dt[, last(Product), by = Group]$V1)[.GRP], by = Group]

This results in the following updated dt, where newcol matches your desired column with the unnecessarily long name. ;)

   Product Group LastProductOfPriorGroup newcol
1:       A     1                      NA     NA
2:       B     1                      NA     NA
3:       C     2                       B      B
4:       D     2                       B      B
5:       E     2                       B      B
6:       F     3                       E      E
7:       G     3                       E      E

Let's break the code down from the inside out. I will use ... to denote the accumulated code:

  • dt[, last(Product), by = Group]$V1 is getting the last values from each group as a character vector.
  • shift(...) shifts the character vector in the previous call
  • dt[, newcol := ...[.GRP], by = Group] groups by Group and uses the internal .GRP values for indexing

Update: Frank brings up a good point about my code above calculating the shift for every group over and over again. To avoid that, we can use either

shifted <- shift(dt[, last(Product), Group]$V1)
dt[, newcol := shifted[.GRP], by = Group]

so that we don't calculate the shift for every group. Or, we can take Frank's nice suggestion in the comments and do the following.

dt[dt[, last(Product), by = Group][, v := shift(V1)], on="Group", newcol := i.v] 
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • You are using by=Group 2 times. What is the 1st one doing vs the 2nd one? and what is the significance of $V1? – gibbz00 Jul 09 '16 at 20:28
  • 1
    @gibbz00 - I added some explanation. Hope it helps. – Rich Scriven Jul 09 '16 at 20:35
  • 1
    You are computing `shift(dt[, last(Product), by = Group]$V1)` again and again for each group. I would probably do `dt[dt[, last(Product), by = Group][, v := shift(V1)], on="Group", newcol := i.v]` – Frank Jul 10 '16 at 01:26
  • 1
    @Frank - I think `shifted <- shift(dt[, last(Product), Group]$V1); dt[, newcol := shifted[.GRP], by = Group]` would also be okay then, since it's outside the data table call. – Rich Scriven Jul 10 '16 at 05:07
11

Another way is to save the last group's value in a variable.

this = NA_character_    # initialize
dt[, LastProductOfPriorGroup:={ last<-this; this<-last(Product); last }, by=Group]
dt
   Product Group LastProductOfPriorGroup
1:       A     1                      NA
2:       B     1                      NA
3:       C     2                       B
4:       D     2                       B
5:       E     2                       B
6:       F     3                       E
7:       G     3                       E

NB: last() is a data.table function which returns the last item of a vector (of the Product column in this case).

This should also be fast since no logic is being invoked to fetch the last group's value; it just relies on the groups running in order (which they do).

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224