0

I have a data frame that looks like the following:

   eid    prod     dp
1    1    0 2015-12-28
2    1    1 2016-01-28
3    1    1 2016-02-28
4    2    1 2015-12-28
5    2    1 2016-01-28
6    2    1 2016-02-28
7    3    0 2015-11-28
8    3    1 2015-12-28
9    3    1 2016-01-28
10   3    0 2016-02-28

eid: alphanumeric id

prod: product holding on a certain date. So if you have added a product on a date from last month, it will still show this month, until you have opted out

dp: one row per date.

I am trying to build a flag that will tell me whether a eid added a new prod in that month or not. So I need output as followin in flg columns:

   eid    prod     dp    flg
1    1    0 2015-12-28   0
2    1    1 2016-01-28   1
3    1    1 2016-02-28   0
4    2    1 2015-12-28   0
5    2    1 2016-01-28   0
6    2    1 2016-02-28   0
7    3    0 2015-11-28   0
8    3    1 2015-12-28   1
9    3    1 2016-01-28   0
10   3    0 2016-02-28  -1

Assume that the data frame isn't sorted in any specific order

Roland
  • 127,288
  • 10
  • 191
  • 288

1 Answers1

0

We just need diff after grouping by 'eid'

library(data.table)
setDT(df1)[, flg := c(0, diff(prod)), eid]
df1
#    eid prod         dp flg
# 1:   1    0 2015-12-28   0
# 2:   1    1 2016-01-28   1
# 3:   1    1 2016-02-28   0
# 4:   2    1 2015-12-28   0
# 5:   2    1 2016-01-28   0
# 6:   2    1 2016-02-28   0
# 7:   3    0 2015-11-28   0
# 8:   3    1 2015-12-28   1
# 9:   3    1 2016-01-28   0
#10:   3    0 2016-02-28  -1

Or using the same approach in dplyr

library(dplyr)
df1 %>%
    group_by(eid) %>%
    mutate(flg = c(0, diff(prod)))

Or with base R

with(df1, ave(prod, eid, FUN = function(x) c(0, diff(x))))
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    You could probably add the dplyr solution to the dupe target instead – David Arenburg Dec 19 '16 at 09:24
  • Hi, does this mean that the rows needs to be ordered by dp column already? – Sumit Dhar Dec 19 '16 at 09:28
  • @SumitDhar If the rows are not ordered, then `setDT(df1)[order(dp), flg := c(0, diff(prod)), eid]` assuming 'dp' is `Date` class – akrun Dec 19 '16 at 09:32
  • Also can we generalize this a bit to do a operations across k rows? Say max of a certain prod value in the past 3 months or the like? – Sumit Dhar Dec 19 '16 at 09:33
  • @SumitDhar Yes, you can create a grouping column using `rep` along with the 'eid' and then do the assignment. If it is based on 'Date', then `cut` the 'Date' column into different groups using the `breaks = '3 months'` and use it as grouping variable – akrun Dec 19 '16 at 09:34