finding change of value in a variable based on other columns in row by row operations

Question

I have a data frame that looks like the following:

   eid    prod     dp
1    1    0 2015-12-28
2    1    1 2016-01-28
3    1    1 2016-02-28
4    2    1 2015-12-28
5    2    1 2016-01-28
6    2    1 2016-02-28
7    3    0 2015-11-28
8    3    1 2015-12-28
9    3    1 2016-01-28
10   3    0 2016-02-28

eid: alphanumeric id

prod: product holding on a certain date. So if you have added a product on a date from last month, it will still show this month, until you have opted out

dp: one row per date.

I am trying to build a flag that will tell me whether a eid added a new prod in that month or not. So I need output as followin in flg columns:

   eid    prod     dp    flg
1    1    0 2015-12-28   0
2    1    1 2016-01-28   1
3    1    1 2016-02-28   0
4    2    1 2015-12-28   0
5    2    1 2016-01-28   0
6    2    1 2016-02-28   0
7    3    0 2015-11-28   0
8    3    1 2015-12-28   1
9    3    1 2016-01-28   0
10   3    0 2016-02-28  -1

Assume that the data frame isn't sorted in any specific order

score 0 · Answer 1 · edited Dec 19 '16 at 09:23

0

We just need diff after grouping by 'eid'

library(data.table)
setDT(df1)[, flg := c(0, diff(prod)), eid]
df1
#    eid prod         dp flg
# 1:   1    0 2015-12-28   0
# 2:   1    1 2016-01-28   1
# 3:   1    1 2016-02-28   0
# 4:   2    1 2015-12-28   0
# 5:   2    1 2016-01-28   0
# 6:   2    1 2016-02-28   0
# 7:   3    0 2015-11-28   0
# 8:   3    1 2015-12-28   1
# 9:   3    1 2016-01-28   0
#10:   3    0 2016-02-28  -1

Or using the same approach in dplyr

library(dplyr)
df1 %>%
    group_by(eid) %>%
    mutate(flg = c(0, diff(prod)))

Or with base R

with(df1, ave(prod, eid, FUN = function(x) c(0, diff(x))))

edited Dec 19 '16 at 09:23

David Arenburg

91,361
17
137
196

answered Dec 19 '16 at 09:19

akrun

874,273
37
540
662

1

You could probably add the dplyr solution to the dupe target instead – David Arenburg Dec 19 '16 at 09:24
Hi, does this mean that the rows needs to be ordered by dp column already? – Sumit Dhar Dec 19 '16 at 09:28
@SumitDhar If the rows are not ordered, then `setDT(df1)[order(dp), flg := c(0, diff(prod)), eid]` assuming 'dp' is `Date` class – akrun Dec 19 '16 at 09:32
Also can we generalize this a bit to do a operations across k rows? Say max of a certain prod value in the past 3 months or the like? – Sumit Dhar Dec 19 '16 at 09:33
@SumitDhar Yes, you can create a grouping column using `rep` along with the 'eid' and then do the assignment. If it is based on 'Date', then `cut` the 'Date' column into different groups using the `breaks = '3 months'` and use it as grouping variable – akrun Dec 19 '16 at 09:34

finding change of value in a variable based on other columns in row by row operations

1 Answers1