I have a table containing historical daily price data for different products and NA for future prices. I have a column for the expected price increase on a given future date for a product. The price increase is based off of prior day price.
I have constructed a for-loop to calculate the expected price for the products, but it runs very slowly for the ~500,000 records it is looping through.
All historical price data is in the table while all projected price is NA.
Example of current table (old_table):
date product price incr_amt
====================================================
... ... ... ...
10/14/19 prod1 50 1.0
10/15/19 prod1 50 1.0
10/16/19 prod1 NA 1.0
... ... ... ...
04/01/20 prod1 NA 1.05
04/02/20 prod1 NA 1.0
... ... ... ...
... ... ... ...
10/14/19 prod2 35 1.0
10/15/19 prod2 35 1.0
10/16/19 prod2 NA 1.0
... ... ... ...
01/01/20 prod2 NA 1.02
01/02/20 prod2 NA 1.0
... ... ... ...
My current code groups by product, then if price is NA then calculate price as lagged price * increase_amt. Then recalculate lagged_price for next iteration. Loop through until all rows in table.
Example result (new_table):
date product price incr_amt
====================================================
... ... ... ...
10/14/19 prod1 50 1.0
10/15/19 prod1 50 1.0
10/16/19 prod1 50 1.0
... ... ... ...
04/01/20 prod1 52.5 1.05
04/02/20 prod1 52.5 1.0
... ... ... ...
... ... ... ...
10/14/19 prod2 35 1.0
10/15/19 prod2 35 1.0
10/16/19 prod2 35 1.0
... ... ... ...
01/01/20 prod2 35.7 1.02
01/02/20 prod2 35.7 1.0
... ... ... ...
My current code works, but it takes over an hour to run. Because each iteration is dependent upon the previous and order matters, I don't know if there is a work around using a loop.
Current Code:
library(tidyverse)
old_table <- tribble(
~date, ~product, ~price, ~incr_amt,
"2019-10-14", "prod1", 50, 1.0,
"2019-10-15", "prod1", 50, 1.0,
"2019-10-16", "prod1", NA, 1.0,
"2019-10-17", "prod1", NA, 1.0,
"2019-10-18", "prod1", NA, 1.0,
"2019-10-19", "prod1", NA, 1.05,
"2019-10-20", "prod1", NA, 1.0,
"2019-10-21", "prod1", NA, 1.0,
"2019-10-14", "prod2", 35, 1.0,
"2019-10-15", "prod2", 35, 1.0,
"2019-10-16", "prod2", NA, 1.0,
"2019-10-17", "prod2", NA, 1.0,
"2019-10-18", "prod2", NA, 1.0,
"2019-10-19", "prod2", NA, 1.0,
"2019-10-20", "prod2", NA, 1.0,
"2019-10-21", "prod2", NA, 1.02,
"2019-10-22", "prod2", NA, 1.0
)
new_table <- old_table %>%
group_by(product) %>%
mutate(lag_price = lag(price))
for (i in 1:nrow(new_table)) {
if (!is.na(new_table$price[[i]]))
next
if (is.na(new_table$price[[i]])) {
new_table$price[[i]] = new_table$lag_price[[i]] * new_table$incr_amt[[i]]
new_table$lag_price <- lag(new_table$price)
}
}
The code runs, but takes over an hour to loop through the ~500,000 records. How can I improve this process? Thanks.