2

Some data

df <- data.frame(
  dates = seq(as.Date('2018-01-01'), as.Date('2018-12-01'), by = "month"),
  user_id = c("a","b","c","a","b","c","a","b","c","a","b","c"),
  somenum = 1:12
)

 df
        dates user_id somenum
1  2018-01-01       a       1
2  2018-02-01       b       2
3  2018-03-01       c       3
4  2018-04-01       a       4
5  2018-05-01       b       5
6  2018-06-01       c       6
7  2018-07-01       a       7
8  2018-08-01       b       8
9  2018-09-01       c       9
10 2018-10-01       a      10
11 2018-11-01       b      11
12 2018-12-01       c      12

Using the answers e.g. here, I am able to calculate a 3 months rolling average. The solutions arrange the data in order of user, then use rollapply().

This is fine for calculating the e.g. rolling 3 month average for a user_id. What if I wanted to calculate the rolling average from the previous 3 months to that, exclusive of the initial 3 months mean?

E.g. user_id 'c' has a 3 months average of 12 + 9 + 6 = 27/3 = 9. If wanted the previous 3 it would be just mean(3) = 3. If my data were longer there might have been 3 months going further back than the first inital 3.

The end goal is to analyse user behavior change... what is the ratio of somenum for a user in the most recent 3 months vs. the preceeding 3 months.

How could this be calculated?

Doug Fir
  • 19,971
  • 47
  • 169
  • 299

1 Answers1

0

One option is to use the rollmean argument align="left" to create a column and align="right" in another. if you want the function to be used even for groups with less than 3 observation, you have to usse the rollapply function with the argument partial=T.

André Costa
  • 377
  • 1
  • 11
  • Hi Andre. Thanks for answering but I'm not following. Are you able to expand upon your solution? – Doug Fir Jun 19 '18 at 14:27
  • The align argument choses the reference point for the new value. if align is "left" the mean will be calculated starting with the value, if "right", ending with it. (take a look at ?rollaplly). The argument partial=T makes the mean to be calculated even if there are missing values close to it. (you might also use functions such as lag and lead) – André Costa Jun 21 '18 at 21:09