R: Calculating average by team

Question

I have soccer results data in the following format (thousands of observations):

     Div  date       value     pts
1    E0 2011-08-13   Blackburn 0.0
2    E0 2011-08-13      Fulham 0.5
3    E0 2011-08-13   Liverpool 0.5
4    E0 2011-08-13   Newcastle 0.5
5    E0 2011-08-13         QPR 0.0
6    E0 2011-08-13       Wigan 0.5
7    E0 2011-08-14       Stoke 0.5
8    E0 2011-08-14   West Brom 0.0
9    E0 2011-08-15    Man City 1.0
10   E0 2011-08-20     Arsenal 0.0
11   E0 2011-08-20 Aston Villa 1.0

plus other variables. "value" is the team, pts is the final result (win/loss/draw) as a numerical value. I'm trying to add a new variable which is the average of this value over the last X games for the team in that row. How do I do this without using some horrible loop?

score 3 · Accepted Answer · edited May 23 '17 at 10:34

take a look at this

using the zoo package and rollmean and the plyr package's ddply:

library(zoo)
library(plyr)
dat <- data.frame(value=letters[1:5], pts=sample(c(0, 0.5, 1), 50, replace=T))
ddply(dat, .(value), summarise, rollmean(pts, k=5, align='right'))

however, as far as I understand a "rolling average" it shortens your data set by definition. you can supply a fill argument though:

ddply(dat, .(value), summarise, rollmean(pts, k=5, fill=NA, align='right'))

MYaseen208 · Answer 2 · 2012-02-02T18:30:52.180

1

Try ave function from stats.

Trt <- gl(n=2, k=3, length=2*3, labels =c("A", "B"))
Y <- 1:6
Data <- data.frame(Trt, Y)
 Data
  Trt Y
1   A 1
2   A 2
3   A 3
4   B 4
5   B 5
6   B 6
Data$TrtMean <- ave(Y, Trt, FUN=mean)
Data
  Trt Y TrtMean
1   A 1       2
2   A 2       2
3   A 3       2
4   B 4       5
5   B 5       5
6   B 6       5

edited Feb 02 '12 at 18:30

answered Feb 02 '12 at 17:57

MYaseen208

22,666
37
165
309

Please provide some example code, this makes it a lot clearer for the OP. – Paul Hiemstra Feb 02 '12 at 18:12
@PaulHiemstra: I've added an example. – MYaseen208 Feb 02 '12 at 18:31
Is there an easy way to modify this to do what the question asks for though? – Dason Feb 02 '12 at 19:58

jbaums · Answer 3 · 2012-02-05T13:29:36.727

This can be done quite efficiently with tapply. I've altered your data somewhat by duplicating teams' games, with random scores and dates. This takes the mean of the most recent 2 games, as specified in the tail function.

# create some data
d <- structure(list(Div = structure(rep(1L, 33), .Label = " E0", 
  class = "factor"), date = structure(c(15013, 14990, 14996, 15001, 14995, 15006, 
  15020, 15032, 15023, 15022, 15015, 15016, 15034, 14994, 14986, 14998, 14982, 
  14979, 14980, 15016, 15031, 15013, 15031, 14999, 15025, 14978, 15007, 15026, 
  14992, 14997, 15023, 14986, 15028), class = "Date"), 
  value = structure(c(3L, 4L, 5L, 7L, 8L, 11L, 9L, 10L, 6L, 1L, 2L, 3L, 4L, 5L, 
  7L, 8L, 11L, 9L, 10L, 6L, 1L, 2L, 3L, 4L, 5L, 7L, 8L, 11L, 9L, 10L, 6L, 1L, 
  2L), .Label = c("Arsenal", "Aston Villa", "Blackburn", "Fulham", "Liverpool",
  "Man City", "Newcastle", "QPR", "Stoke", "West Brom", "Wigan"), 
  class = "factor"), pts = c(0.5, 0.5, 0.5, 1, 1, 1, 1, 0, 1, 0.5, 0, 1, 1, 1, 1, 
  0.5, 0.5, 0, 0.5, 0.5, 0, 0, 0, 1, 0, 0, 0.5, 0, 1, 0, 0.5, 0.5, 0.5)), 
  .Names = c("Div", "date", "value", "pts"), row.names = c(NA, 33L), 
  class = "data.frame")

# sort rows by date
d2 <- d[order(d$date),]
# mean of all games
tapply(d2$pts, d2$value, mean)
# mean of last 2 games
tapply(d2$pts, d2$value, function(x) mean(tail(x, 2)))

# To tidy up the output, you could use simplify=FALSE and do.call(rbind, x):
# e.g., mean of last 2 games:
do.call(rbind, tapply(d2$pts, d2$value, function(x) mean(tail(x, 2)), 
  simplify=F))

            [,1]
Arsenal     0.25
Aston Villa 0.25
Blackburn   0.50
Fulham      1.00
Liverpool   0.25
Man City    0.75
Newcastle   1.00
QPR         0.50
Stoke       1.00
West Brom   0.00
Wigan       0.50

In fact, `aggregate` would do the job in one step, e.g. `aggregate(d2$pts, list(d2$value), function(x) mean(tail(x, 2)))` — jbaums, Feb 05 '12 at 13:26

R: Calculating average by team

3 Answers3