I want to connect two datasets with each other by adding a new column called Average
. This column is the average of the durations between Date
and Date - diff
. I got two datasets, the first one is called data
and looks like this:
Date Weight diff Loc.nr
2013-01-24 1040 7 2
2013-01-31 1000 7 2
2013-01-19 500 4 9
2013-01-23 1040 4 9
2013-01-28 415 5 9
2013-01-31 650 3 9
The other one is called Rain.duration
, in the column Duration
are the hours of rain on that day. This dataset looks like this:
Date Duration
2013-01-14 4.5
2013-01-15 0.0
2013-01-16 6.9
2013-01-17 0.0
2013-01-18 1.8
2013-01-19 2.1
2013-01-20 0.0
2013-01-21 0.0
2013-01-22 4.3
2013-01-23 0.0
2013-01-24 7.5
2013-01-25 4.7
2013-01-26 0.0
2013-01-27 0.7
2013-01-28 5.0
2013-01-29 0.0
2013-01-30 3.1
2013-01-31 2.8
I made a code to do this:
for(i in 1:nrow(data)) {
for(j in 1:nrow(Rain.duration)) {
if(data$Date[i] == Rain.duration$Date[j]) {
average <- as.array(Rain.duration$Duration[(j-(data$diff[i])):j])
j <- nrow(Rain.duration)
}
}
data$Average[i] <- mean(average)
}
The problem of this code is that, because of the size of my datasets, it takes like 3 days to run. Is there a faster way to do this?
My expected outcome is:
Date Weight diff Loc.nr Average
2013-01-24 1040 7 2 1.96
2013-01-31 1000 7 2 2.98
2013-01-19 500 4 9 2.16
2013-01-23 1040 4 9 1.28
2013-01-28 415 5 9 2.98
2013-01-31 650 3 9 2.73