I made a for loop in R to fetch information from dataframe2 to add to dataframe1. The loop takes a date and time from dataframe1 and looks up what "step number" corresponds to that date and time by checking between what start and end time (start and end are two different columns and have data each row) from dataframe2 it is. Then it adds this step number to dataframe1 for that particular row. However, this loop takes hours to run.
I have read that the cause of this long duration is that R has to build up the whole dataframe with each loop iteration (not sure if this is correct, but this is what I understood). I have tried different methods of doing the same thing but I could not get them to work.
This is an example of what I am doing with very small dataframes (actual data has ~60.000 rows in dataframe1 and 300 in dataframe2):
DateTime <- c("2022-09-20 15:00:00", "2022-09-20 19:00:00", "2022-09-21 15:00:00",
"2022-09-21 19:00:00", "2022-09-22 15:00:00")
Value <- c(1,2,3,4,5)
dataframe1 <- data.frame(DateTime, Value)
Start <- c("2022-09-20 01:00:00", "2022-09-20 17:00:00", "2022-09-21 13:00:00",
"2022-09-21 18:00:00", "2022-09-22 13:00:00")
End <- c("2022-09-20 16:00:00", "2022-09-20 23:59:59", "2022-09-21 17:00:00",
"2022-09-21 23:00:00", "2022-09-22 19:00:00")
Step <- c(1,2,3,4,5)
dataframe2 <- data.frame(Start, End, Step)
dataframe1$Step <- 0
for (i in 1:nrow(dataframe1)) {
for (j in 1:nrow(dataframe2)) {
if (dataframe1[i,1] > dataframe2[j,1] & dataframe1[i,1] < dataframe2[j,2]) {
dataframe1[i,3] <- dataframe2[j,3]
}
}
}
First, I create a new column called "Step" to which the step number needs to be added. Then, I loop over each row in dataframe1 to get the date & time of that datapoint. After that, I loop through each row of dataframe2. In dataframe2, the first column has the start time and the second column has the end time of that step.
So if the Date and Time of a datapoint in dataframe1 is between the start and end time of a row in dataframe2, then the step number in that row of dataframe2 will be added to the new "Step" column in dataframe2.
As I said, it works, but it takes a long time and I think there should be a more computationally efficient way to do this.