Get the last row of a previous group in data.table

Question

This is what my data table looks like:

library(data.table)
dt <- fread('
    Product  Group    LastProductOfPriorGroup
    A          1          NA
    B          1          NA
    C          2          B
    D          2          B
    E          2          B
    F          3          E
    G          3          E
')

The LastProductOfPriorGroup column is my desired column. I am trying to fetch the product from last row of the prior group. So in the first two rows, there are no prior groups and therefore it is NA. In the third row, the product in the last row of the prior group 1 is B. I am trying to accomplish this by

dt[,LastGroupProduct:= shift(Product,1), by=shift(Group,1)]

to no avail.

Rich Scriven · Accepted Answer · 2016-08-01T21:27:59.317

You could do

dt[, newcol := shift(dt[, last(Product), by = Group]$V1)[.GRP], by = Group]

This results in the following updated dt, where newcol matches your desired column with the unnecessarily long name. ;)

   Product Group LastProductOfPriorGroup newcol
1:       A     1                      NA     NA
2:       B     1                      NA     NA
3:       C     2                       B      B
4:       D     2                       B      B
5:       E     2                       B      B
6:       F     3                       E      E
7:       G     3                       E      E

Let's break the code down from the inside out. I will use ... to denote the accumulated code:

dt[, last(Product), by = Group]$V1 is getting the last values from each group as a character vector.
shift(...) shifts the character vector in the previous call
dt[, newcol := ...[.GRP], by = Group] groups by Group and uses the internal .GRP values for indexing

Update: Frank brings up a good point about my code above calculating the shift for every group over and over again. To avoid that, we can use either

shifted <- shift(dt[, last(Product), Group]$V1)
dt[, newcol := shifted[.GRP], by = Group]

so that we don't calculate the shift for every group. Or, we can take Frank's nice suggestion in the comments and do the following.

dt[dt[, last(Product), by = Group][, v := shift(V1)], on="Group", newcol := i.v]

You are using by=Group 2 times. What is the 1st one doing vs the 2nd one? and what is the significance of $V1? — gibbz00, Jul 09 '16 at 20:28
You are computing `shift(dt[, last(Product), by = Group]$V1)` again and again for each group. I would probably do `dt[dt[, last(Product), by = Group][, v := shift(V1)], on="Group", newcol := i.v]` — Frank, Jul 10 '16 at 01:26
@Frank - I think `shifted <- shift(dt[, last(Product), Group]$V1); dt[, newcol := shifted[.GRP], by = Group]` would also be okay then, since it's outside the data table call. — Rich Scriven, Jul 10 '16 at 05:07

score 11 · Answer 2 · answered Feb 23 '17 at 22:41

Another way is to save the last group's value in a variable.

this = NA_character_    # initialize
dt[, LastProductOfPriorGroup:={ last<-this; this<-last(Product); last }, by=Group]
dt
   Product Group LastProductOfPriorGroup
1:       A     1                      NA
2:       B     1                      NA
3:       C     2                       B
4:       D     2                       B
5:       E     2                       B
6:       F     3                       E
7:       G     3                       E

NB: last() is a data.table function which returns the last item of a vector (of the Product column in this case).

This should also be fast since no logic is being invoked to fetch the last group's value; it just relies on the groups running in order (which they do).

Get the last row of a previous group in data.table

2 Answers2

Linked