I'm working on a moving average calculator that works as intended, but just takes a long time to run the calc because it is currently looping through the formula instead of being vectorized. The data set is about ~16000 rows. The MA formula is written to filter out the top quartile of values appearing in the previous 45 days of usd_price. Any tips/changes to get this running more efficiently as a vector calc?
The dput output is:
> dput(data)
structure(list(loan_price = c(50000, 60000, 40000, 35000, 1e+05,
95000), cad_price = c(62500, 75000, 50000, 43750, 125000, 118750
), day = structure(c(1642118400, 1641772800, 1639958400, 1639785600,
1638316800, 1640995200), tzone = "UTC", class = c("POSIXct", "POSIXt")), fourtyfive_avg = c(251435.529507523, 251435.529507523,
251435.529507523, 251435.529507523, 251435.529507523, 251435.529507523
), Loan = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
library(readxl)
library(tidyverse)
library(zoo)
library(dplyr)
data<- data%>% mutate(day=lubridate::parse_date_time(day, "ymd"))
myfunc <- function(x){
fourtyfive_days <- as.Date(x - ddays(45))
data<-
data %>%
filter(day <= x) %>%
filter(day >= fourtyfive_days) %>%
filter(loan_price<= quantile(loan_price, probs = 0.75)) %>%
summarize(fourtyfive_avg = mean(loan_price))
return(data$fourtyfive_avg)
}
data$fourtyfive_avg <- sapply(data$day, simplify = TRUE, FUN = myfunc)