I have written the following function in R to calculate the two-day mean VARs of each date and previous day for a dataframe with the column names DATE (YYYY-MM-DD), ID, VAR1, and VAR2. There are no missing dates.
df <- data.frame
TWODAY <- function(df){
df$TWODAY_VAR1 <- NA
for(j in 2:length(df$VAR1)){
df$TWODAY_VAR1[j] <- mean(df$VAR1[j:(j-1)])
}
df$TWODAY_VAR2 <- NA
for(j in 2:length(df$VAR2)){
df$TWODAY_VAR2[j] <- mean(df$VAR2[j:(j-1)])
}
return(df)
}
I then applied this function to my dataframe with ddply:
df <- ddply(df, "ID", TWODAY)
However, my dataframe consists of over 13,000,000 observations, and this is running very slow. Does anyone have any recommendations of how I could edit my code to make it more efficient?
Any advice would be greatly appreciated!