-2

I have the following data:

mydf[77:84,]
   id game_week points  code  web_name first_name second_name position team_name     date fixture team1 team2 home_away team_scored team_conceded minutes goals assists cleansheet goals_conceded own_goals
77  3         1     -2 51507 Koscielny    Laurent   Koscielny Defender   Arsenal 17/08/13 ARS-AVL   ARS   AVL         H           1             3      67     0       0          0              3         0
78  3         2      0 51507 Koscielny    Laurent   Koscielny Defender   Arsenal 24/08/13 FUL-ARS   ARS   FUL         A           3             1       0     0       0          0              0         0
79  3         3      6 51507 Koscielny    Laurent   Koscielny Defender   Arsenal 01/09/13 ARS-TOT   ARS   TOT         H           1             0      90     0       0          1              0         0
80  3         4      2 51507 Koscielny    Laurent   Koscielny Defender   Arsenal 14/09/13 SUN-ARS   ARS   SUN         A           3             1      90     0       0          0              1         0
81  3         5      2 51507 Koscielny    Laurent   Koscielny Defender   Arsenal 22/09/13 ARS-STK   ARS   STK         H           3             1      90     0       0          0              1         0
82  3         6      2 51507 Koscielny    Laurent   Koscielny Defender   Arsenal 28/09/13 SWA-ARS   ARS   SWA         A           2             1      90     0       0          0              1         0
83  3         7      3 51507 Koscielny    Laurent   Koscielny Defender   Arsenal 06/10/13 WBA-ARS   ARS   WBA         A           1             1      90     0       0          0              1         0
84  3         8      2 51507 Koscielny    Laurent   Koscielny Defender   Arsenal 19/10/13 ARS-NOR   ARS   NOR         H           4             1      90     0       0          0              1         0

As a part of modeling exercise, I want to create a new variable, "mov_avg_min", which for a given "id", is the average of "minutes" played in the last 3 "game_week". Example, for web_name "Koscielny" his distinct "id" is 3 in this data_frame. So for id= 3 and game_week=4, a function should calculate mov_avg_min of game_weeks 1:3 (3 game_week before current game_week for the same id value). Hence in row 80, mov_avg_min = 1/3(67+0+90)=52.333

scoa
  • 19,359
  • 5
  • 65
  • 80
Manasvi Bali
  • 25
  • 1
  • 3
  • 2
    It should be something like `df %>% group_by(id) %>% mutate(mov_avg_min = mean(nth(minutes, length(minutes) - 3))) ` with `dplyr` but this sounds like a job you have to do and want others to do since you showed no effort to solve your problem by your own. – SabDeM Aug 07 '15 at 14:11
  • possible duplicate of [R: Selecting first of n consecutive rows above a certain threshold value](http://stackoverflow.com/questions/31373256/r-selecting-first-of-n-consecutive-rows-above-a-certain-threshold-value) – chappers Aug 07 '15 at 14:32
  • 1
    If you use `dplyr` with `rollapply` you can get the answer: `dat %>% group_by(id) %>% mutate(mov_avg_min=rollapply(minutes, width=3, mean, align="right", fill=NA, na.rm=TRUE))` – chappers Aug 07 '15 at 14:33

1 Answers1

0

I think the rollapply (of the zoo package) with width = 3 will include the value of the row you consider. So, for game 4 it will give you the average of minutes in games 2,3 and 4. I think you have to lag the minutes column first in order to get the average based on games 1,2 and 3. See a simple example below:

 library(dplyr)
 library(zoo)

 dt = data.frame(id = c(1,1,1,1,1,2,2,2,2,2),
                 games = c(1,2,3,4,5,1,2,3,4,5),
                 minutes = c(61,72,73,82,82,81,71,51,90,73))

 dt

 #   id games minutes
 #       1   1     1      61
 #       2   1     2      72
 #       3   1     3      73
 #       4   1     4      82
 #       5   1     5      82
 #       6   2     1      81
 #       7   2     2      71
 #       8   2     3      51
 #       9   2     4      90
 #       10  2     5      73

 dt %>% group_by(id) %>%
   mutate(lag_minutes = lag(minutes, default=NA)) %>%
   mutate(RA = rollapply(lag_minutes,width=3,mean, align= "right", fill=NA))


 #   Source: local data frame [10 x 5]
 #       Groups: id
 #       
 #          id games minutes lag_minutes       RA
 #       1   1     1      61          NA       NA
 #       2   1     2      72          61       NA
 #       3   1     3      73          72       NA
 #       4   1     4      82          73 68.66667
 #       5   1     5      82          82 75.66667
 #       6   2     1      81          NA       NA
 #       7   2     2      71          81       NA
 #       8   2     3      51          71       NA
 #       9   2     4      90          51 67.66667
 #       10  2     5      73          90 70.66667
AntoniosK
  • 15,991
  • 2
  • 19
  • 32