Im new to R and im stuck with a problem i can't solve by myself.
A friend recommended me to use one of the apply functions, i just dont get how to use it in this case. Anyway, on to the problem! =)
Inside the inner while loop, I have an ifelse. That is the bottleneck. It takes on average 1 second to run each iteration. The slow part is marked with #slow part start/end in the code.
Given that, we will run it 2000*100 = 200000 times it will take aproximately 55.5 hours to finish each time we run this code. And the bigger problem is that this will be reused a lot. So x*55.5 hours is just not doable.
Below is a fraction of the code relevant to the question
#dt is data.table with close to 1.5million observations of 11 variables
#rand.mat is a 110*100 integer matrix
j <- 1
while(j <= 2000)
{
#other code is executed here, not relevant to the question
i <- 1
while(i <= 100)
{
#slow part start
t$column2 = ifelse(dt$datecolumn %in% c(rand.mat[,i]) & dt$column4==index[i], NA, dt$column2)
#slow part end
i <- i + 1
}
#other code is executed here, not relevant to the question
j <- j + 1
}
Please, any advice would be greatly appreciated.
EDIT - Run below code to reproduce problem
library(data.table)
dt = data.table(datecolumn=c("20121101", "20121101", "20121104", "20121104", "20121130",
"20121130", "20121101", "20121101", "20121104", "20121104", "20121130", "20121130"), column2=c("5",
"3", "4", "6", "8", "9", "2", "4", "3", "5", "6", "8"), column3=c("5",
"3", "4", "6", "8", "9", "2", "4", "3", "5", "6", "8"), column4=c
("1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2"))
unq_date <- c(20121101L,
20121102L, 20121103L, 20121104L, 20121105L, 20121106L, 20121107L,
20121108L, 20121109L, 20121110L, 20121111L, 20121112L, 20121113L,
20121114L, 20121115L, 20121116L, 20121117L, 20121118L, 20121119L,
20121120L, 20121121L, 20121122L, 20121123L, 20121124L, 20121125L,
20121126L, 20121127L, 20121128L, 20121129L, 20121130L
)
index <- as.numeric(dt$column4)
numberOfRepititions <- 2
set.seed(131107)
rand.mat <- replicate(numberOfRepititions, sample(unq_date, numberOfRepititions))
i <- 1
while(i <= numberOfRepititions)
{
dt$column2 = ifelse(dt$datecolumn %in% c(rand.mat[,i]) & dt$column4==index[i], NA, dt$column2)
i <- i + 1
}
Notice that we wont be able to run the loop more than 2 times now unless dt grows in rows so that we have the initial 100 types of column4 (which is just an integer value 1-100)