I think exemple are easier to understand. So here is how to generate a small fake data set as an exemple :
library(tidyr)
day_event<- as.Date("2017-03-01") + 0:6
a<-rep(1,7)
b<-as.numeric(c("", rep(1,6)))
c<-as.numeric(c("","",rep(1,5)))
df_1<-data.frame(day_event,a,b,c)
names(df_1)[2]<-"2017-03-08"
names(df_1)[3]<-"2017-03-09"
names(df_1)[4]<-"2017-03-10"
> df_1
day_event 2017-03-08 2017-03-09 2017-03-10
1 2017-03-01 1 NA NA
2 2017-03-02 1 1 NA
3 2017-03-03 1 1 1
4 2017-03-04 1 1 1
5 2017-03-05 1 1 1
6 2017-03-06 1 1 1
7 2017-03-07 1 1 1
I get the data set in df2 format but using tidyr I can go from one format to the other :
df_2<-gather(df_1, day_measure, measure, -day_event)
> df_2
day_event day_measure measure
1 2017-03-01 2017-03-08 1
2 2017-03-02 2017-03-08 1
3 2017-03-03 2017-03-08 1
4 2017-03-04 2017-03-08 1
5 2017-03-05 2017-03-08 1
6 2017-03-06 2017-03-08 1
7 2017-03-07 2017-03-08 1
8 2017-03-01 2017-03-09 NA
9 2017-03-02 2017-03-09 1
10 2017-03-03 2017-03-09 1
11 2017-03-04 2017-03-09 1
12 2017-03-05 2017-03-09 1
13 2017-03-06 2017-03-09 1
14 2017-03-07 2017-03-09 1
15 2017-03-01 2017-03-10 NA
16 2017-03-02 2017-03-10 NA
17 2017-03-03 2017-03-10 1
18 2017-03-04 2017-03-10 1
19 2017-03-05 2017-03-10 1
20 2017-03-06 2017-03-10 1
21 2017-03-07 2017-03-10 1
For the context, it represents measures of an event that happened on day_event. But depending on the day the measure is performed the measure of the event on event_day can be different !
My probleme is that I only measure events seven days back : that's why the measure on day_mesure = '2017-03-09' for the day_event = '2017-03-01' is NA
I would like to replace this NA by the last measured perform (7 days after the day_event) : in this case replace by the measure made on '2017-03-08'
I tried
for (i in 1:length(df_2$measure)){
row<- df_2[i,]
if (row$day_event +7 < row$day_measure & length(df_2[df_2$day_event == row$day_event & df_2$day_measure == row$day_event + 7,]$measure)>0){
row$measure<-df_2[df_2$day_event == row$day_event & df_2$day_measure == row$day_event + 7,]$measure
df_2[i,]<-row
}
}
It worked :) But on my real data set which is larger it takes forever :(
I think R doesn't like such loops ! Can you think of another method ?
Thanks for your help !