2

I have a data frame containing number of page views per user, per week. I want to determine, for each user, whether their views increased, decreased, or stayed the same after a certain event. My data looks like this:

Userid week xeventinweek numviews
Alice   1    2            5
Alice   2    0            3
Alice   4    1            6
Bob     2    2            3
Bob     3    0            5

So in this case, Alice's views decreased after she had 2 events in week 1, and she had no events in week 2 to measure by. Bob, however, increased his views from 3 to 5 the week after he had two events.

I would like to get a table with the difference in views for every week that had at least one event. So it should look something like this:

Userid  week xeventinweek numviews numnextweek difference
Alice    1      2           5          3               -2
Alice    4      1           6          NA              NA #the row for week 2 is missing because there were no events then for Alice
Bob      2      2           3          5                2

It is not essential to have both the numnextweek and difference columns - either or is ok.

I was able to do this using data.table and a for loop, but it took so long to run that it wasn't feasible. I thought of using a rolling join, but it doesn't seem possible with grouped data (i.e. it would need to be done individually for each Userid.) How can I do this using data.table's native functionality?

ekad
  • 14,436
  • 26
  • 44
  • 46
bsg
  • 825
  • 2
  • 14
  • 34

2 Answers2

3

Using match:

dat[, numnextweek := numviews[match(week + 1, week)] , by=Userid]
dat[, difference := numviews - numnextweek , by=Userid]
dat[xeventinweek != 0]

#   Userid week xeventinweek numviews numnextweek difference
#1:  Alice    1            2        5           3          2
#2:  Alice    4            1        6          NA         NA
#3:    Bob    2            2        3           5         -2
thelatemail
  • 91,185
  • 12
  • 128
  • 188
1

Since version v1.9.6 (on CRAN 19 Sep 2015), the shift() function is available in data.table:

DT[, difference := shift(numviews, type = "lead") - numviews, by = Userid][
  xeventinweek != 0L]
   Userid week xeventinweek numviews difference
1:  Alice    1            2        5         -2
2:  Alice    4            1        6         NA
3:    Bob    2            2        3          2
Uwe
  • 41,420
  • 11
  • 90
  • 134
  • I guess you should be checking for week + 1L (maybe with a join), not just the next row. OP apparently has missing weeks. I mean `DT[.(u = Userid, w = week + 1L, numviews = numviews), on=.(Userid = u, week = w), i.numviews - x.numviews]` more similar to thela's. – Frank Sep 08 '17 at 17:23
  • @Frank I guess, you are right. Unfortunately, the missing week 3 is right after week 2 without event which isn't to be shown anyway. So, just from the expected result we can't be 100% sure how the OP wants to deal with missing weeks. – Uwe Sep 08 '17 at 17:29