Good morning,
I'm running a block of code and its taking a bit too long. The goal is to make a "moving average" of the last X number of days. In this case, the previous 2, 3, 4, and 5 days of scores. It needs to be from the last date Date
, for a unique id EID
. Here it the data:
df:
EID Date Score
1111 5/25/2015 10
1111 5/29/2015 6
1111 6/17/2015 9
12345 5/27/2015 10
12345 1/1/2015 8
12345 1/7/2015 9
12345 1/9/2015 10
12345 1/10/2015 7
19611 1/13/2015 8
19611 1/21/2015 10
19611 1/23/2015 9
19611 1/24/2015 10
19611 1/30/2015 5
19611 2/5/2015 6
19611 2/11/2015 10
19611 2/12/2015 7
19611 2/14/2015 10
19611 2/15/2015 6
19611 2/18/2015 10
19611 2/19/2015 10
This is what I'm currently running on 500,000+ rows across 6 data frames
uniqueID <- unique(df$EID)
rowNr <- lapply(uniqueID,function(uniqueID){which(df$EID==uniqueID)})
lastDate <- lapply(rowNr,function(n){df$Date[rev(n)[1]]})
Avg <- lapply(rowNr,function(n){mean(df$Score[n])})
prev2 <- lapply(rowNr,function(n){mean(df$Score[head(tail(c(NA,n),3),2)])})
prev3 <- lapply(rowNr,function(n){mean(df$Score[head(tail(c(NA,n),4),3)])})
prev4 <- lapply(rowNr,function(n){mean(df$Score[head(tail(c(NA,n),5),4)])})
prev5 <- lapply(rowNr,function(n){mean(df$Score[head(tail(c(NA,n),6),5)])})
Scores <- data.frame(EID = uniqueID,
avg_score = unlist(Avg),
score2 = unlist(prev2),
score3 = unlist(prev3),
score4 = unlist(prev4),
score5 = unlist(prev5))
Here are the results
View(Scores)
EID avg_score score2 score3 score4 score5
1111 8.33 7.50 8.33 NA NA
12345 8.80 8.50 8.67 8.50 8.80
19611 8.42 10.00 8.67 9.00 8.60
Any ideas to make this run faster? Im currently running this on 6 data frames and it takes 10-15 minutes for each one to process. How can this be optimized to run faster?
Thanks!