I have a big data.frame of time-series (>5*10^6 rows) (df_ts
) and a smaller data.frame with manual adjustments for the time-series (df_adj
):
# Example data
sq1 <- seq.Date(as.Date("2017-01-01"), as.Date("2017-01-15"),by = "days")
df_ts <- data.frame(Station = rep(c("A", "B"), each = length(sq1)),
Date = rep(sq1, 2),
Value = rnorm(n = 2* length(sq1)),
ValueAdj = 0)
df_adj <- data.frame(Station = c("A", "A", "B"),
Start = as.Date(c("2017-01-02", "2017-01-10", "2017-01-05")),
End = as.Date(c("2017-01-02", "2017-01-13", "2017-01-05")),
Adj = c(NA, 0.5, -0.6))
Time-steps are not necessary equal as in the example data
I want to change df_ts$ValueAdj
to values from df_adj$Adj
for periods defined by df_adj$Start
and df_adj$End
. Dates are inclusive and values in df_adj$Adj
are numeric or NA.
Below is my solution with a for
loop:
for(i in 1:nrow(df_adj)){
df_ts[df_ts$Station == df_adj$Station[i] &
df_ts$Date >= df_adj$Start[i] &
df_ts$Date <= df_adj$End[i], "ValueAdj"] <-
df_adj$Adj[i]
}
Returns:
Station Date Value ValueAdj
1 A 2017-01-01 -1.9049554 0.0
2 A 2017-01-02 -0.8111702 NA
3 A 2017-01-03 1.3240043 0.0
4 A 2017-01-04 0.6156368 0.0
5 A 2017-01-05 1.0916690 0.0
6 A 2017-01-06 0.3066049 0.0
7 A 2017-01-07 -0.1101588 0.0
8 A 2017-01-08 -0.9243128 0.0
9 A 2017-01-09 1.5929138 0.0
10 A 2017-01-10 0.0450106 0.5
11 A 2017-01-11 -0.7151284 0.5
12 A 2017-01-12 0.8652231 0.5
13 A 2017-01-13 1.0744410 0.5
14 A 2017-01-14 1.8956548 0.0
15 A 2017-01-15 -0.6029973 0.0
16 B 2017-01-01 -0.3908678 0.0
17 B 2017-01-02 -0.4162220 0.0
18 B 2017-01-03 -0.3756574 0.0
19 B 2017-01-04 -0.3666309 0.0
20 B 2017-01-05 -0.2956775 -0.6
21 B 2017-01-06 1.4418204 0.0
22 B 2017-01-07 -0.6975383 0.0
23 B 2017-01-08 -0.3881675 0.0
24 B 2017-01-09 0.6525365 0.0
25 B 2017-01-10 1.1247724 0.0
26 B 2017-01-11 -0.7721108 0.0
27 B 2017-01-12 -0.5080862 0.0
28 B 2017-01-13 0.5236206 0.0
29 B 2017-01-14 1.0177542 0.0
30 B 2017-01-15 -0.2511646 0.0
I am looking for a neater (and faster) solution preferably with dplyr/tidyr. The solution with join %>% filter %>% select
like in the here doesn't work in my case because replacement value could also be NA. I am also not sure how would that perform on a big data.frame.