I have below code, which takes ages to run. I have already tried multiple things such as: limiting the amount of loops, using ifelse statements, and trying to declare as much thing as possible outside of the loop. However, it still takes a very long time.
What would be a good way to improve on this part of my code to improve its processing speed? Are there some things I'm not seeing?
z <- 625
numDays <- 365
k <- numDays * 96
#To estimate the size of the list
df <- rmarkovchain(n=365, object = mcList, t0= "home", include.t0 = TRUE)
allTheCars <- rep(list(df), z)
#an example of df below:
Locations <- c("Home", "Bakery", "Grocery", "Home-Bakery", "Home-Grocery", "Bakery-Home", "Bakery-Grocery", "Grocery-Home", "Grocery-Bakery")
Iteration <- rep(seq(1:96), 365)
df <- data.frame(Iteration, sample(Locations, k, replace = TRUE))
#The loop takes a huge amount of time
for(y in 1:z){
df <- rmarkovchain(n=365, object = mcList, t0= "Home", include.t0 = TRUE)
df$Begin <- 0
df[1,3] <- b
df$Still <- ifelse(df$values == "Home", 1, 0)
df$KM <- vlookup(df$values, averageDistance, lookup_column = 1, result_column = 2)
df$Load <- ifelse(df$Still == 1, cp, 0)
df$costDistance <- df$KM * 0.21
df$End <- 0
df[is.na(df)] <- 0
df$reduce <- rep(seq(1:97), numDays)
df <- df %>% filter(reduce != 97)
df$Load <- ifelse(df$reduce <= 69 | df$reduce >= 87, df$Load, 0)
for(i in 1:k) {
df[i,3] <- ifelse(df[i,3] < b, pmin(df[i,3] + df[i,6], b), df[i,3])
df[i,8] <- df[i,3] - df[i,7]
j <- i + 1
df[j,3] <- df[i,8]
}
allDf[[y]] <- df
}
EDIT: After Minem's suggestion to look at Profvis I found out that the second for-loop takes by far the most amount of time, which now looks like this:
for(i in 1:k) {
mainVector <- df[i,3]
extra <- df[i,6]
subtractingVector <- df[i,7]
mainVector <- ifelse(mainVector < b, pmin(mainVector + extra, b), mainVector )
newMain <- mainVector - subtractingVector
j <- i + 1
df[j,3] <- newMain
}
Now the vectorization of the first three vectors takes some time and the last line of code, which integrates the calculated value back in the dataframe costs the most time. Is there anyway to improve upon this?
Edit 2: Reproducible example for all
library(dplyr)
library(markovchain)
library(expss)
matrixExample <- matrix(sample(runif(81, min = 0 , max =1), replace = FALSE ), nrow = 9, ncol = 9)
mcListLoop <- rep(list(matrixExample), 96)
mcList <- new("markovchainList", markovchains = mcListLoop)
distance <- runif(9, min = 5, max =10)
Locations <- c("Home", "Bakery", "Grocery", "Home-Bakery", "Home-Grocery", "Bakery-Home", "Bakery-Grocery", "Grocery-Home", "Grocery-Bakery")
averageDistance <- data.frame(cbind(distance, Locations))