-3

I have a csv file of every golf score recorded by player, tournament, and date, and I want to create a column that calculates the average of the past 50 scores BY player using the date field. It needs to be running average by Player.

example: Table

PLAYER,SCORE,Tournament,ROUNDDATE,Observation,ID
Matthew Fitzpatrick,60,KLM Open,42258,1,1
Jaco Van Zyl,61,Turkish Airlines Open,42306,1,2
Paul Lawrie,61,KLM Open,42257,1,3
Wade Ormsby,61,KLM Open,42257,1,4
Callum Shinkwin,62,Shenzhen International,42483,1,5
Danny Willett,62,Omega European Masters,42209,1,6
Joakim Lagergren,62,Alfred Dunhill Links Championship,42280,1,7

I have tried this code but it just produces the exact same result and not an average of anything

get.mav <- function(bp,n=50){
     require(zoo)
     if(is.na(bp[1])) bp[1] <- mean(bp,na.rm=TRUE)
     bp <- na.locf(bp,na.rm=FALSE)
     if(length(bp)<n) return(bp)
     c(bp[1:(n-1)],rollapply(bp,width=n,mean,align="right"))  
 }
test <- with(test,test[order(PLAYER,ROUNDDATE),])

test$SCORE_UPDATED <- 
     unlist(aggregate(SCORE~PLAYER,test,get.mav,na.action=NULL,n=50)$SCORE)
test
Nakx
  • 1,460
  • 1
  • 23
  • 32
  • Please `dput` your data or use some other better way of sharing it than this. Also if you don't want to share several hundred lines then perhaps it's better to ask for a running total of the last 2 scores as an example. Then you can change 2 to 50 when you apply the solution to your large data set. BTW what's SAS? You mean that old paid highly verbose statistics dinosaur from the 70's that couldn't stay competitive with FOSS statistical languages? Nobody still uses that. I don't believe you. – Hack-R Jul 24 '16 at 00:27
  • SAS has a TIBOE index rankings of 1.083% & R has 1.514%. Not exactly a YUGE margin, It also has an estimated 2-4x larger user-base than R. Not much utility in mocking it. – hrbrmstr Jul 24 '16 at 00:38

1 Answers1

0

You can just arrange by date and then mean the first 50. Here is a quick example:

# Sample Data
dat <- data.frame(date = seq(as.Date("2000/1/1"), by = "day", length.out = 365),
                  score = round(rnorm(365, 70, 5)))

# Arrange and get mean of first 50 obs
dat <- dat[order(dat$date, decreasing = TRUE),]
mean(dat$score[1:50])
Vedda
  • 7,066
  • 6
  • 42
  • 77