I am looking for a solution similar to the one provided here: How to fill NAs with LOCF by factors in data frame, split by country, but I wish to have a modification to that solution that incorporates a specific number of days to fill forward (i.e. last value in Cutoff column).
I have a large dataframe with missing days (here is the df for one individual, I have 60+)
'data.frame': 387 obs. of 14 variables:
$ Date : POSIXct, format: "2014-08-13" "2014-08-14" "2014-08-15" "2014-08-16" ...
$ MEID.1 : chr NA NA NA "14" ...
$ MEID.2 : Factor w/ 184 levels "1","100","100.1",..: NA NA NA 143 48 NA NA NA NA NA ...
$ MEID.3 : Factor w/ 180 levels "100","100.1",..: NA NA NA 24 134 NA NA NA NA NA ...
$ MEID.4 : Factor w/ 42 levels "173","173a","173b",..: NA NA NA 17 1 NA NA NA NA NA ...
$ MEID.5 : Factor w/ 3 levels "d1","s1","s2": NA NA NA 2 3 NA NA NA NA NA ...
$ MEID.6 : Factor w/ 1 level "s2": NA NA NA NA NA NA NA NA NA NA ...
$ DAYT : int NA NA NA 1 8 NA NA NA NA NA ...
$ DATT : int NA NA NA 1 1 NA NA NA NA NA ...
$ Reason.For.Change: chr "0" "0" "0" "0" ...
$ GAP_Days : chr "1" "2" "3" "NA" ...
$ Max : chr "3" "3" "3" "NA" ...
$ GAP_LABEL_FINAL : chr "1" "1" "1" "NA" ...
$ Cutoff : num 2 2 2 NA NA 4 4 4 4 4 ...
Here is the result of head()
> head(final_daily_intake3)
Date MEID.1 MEID.2 MEID.3 MEID.4 MEID.5 MEID.6 DAYT DATT Reason.For.Change GAP_Days Max GAP_LABEL_FINAL
1 2014-08-13 <NA> <NA> <NA> <NA> <NA> <NA> NA NA 0 1 3 1
2 2014-08-14 <NA> <NA> <NA> <NA> <NA> <NA> NA NA 0 2 3 1
3 2014-08-15 <NA> <NA> <NA> <NA> <NA> <NA> NA NA 0 3 3 1
4 2014-08-16 14 61 117 187 s1 <NA> 1 1 0 NA NA NA
5 2014-08-17 14a 193 56 173 s2 <NA> 8 1 0 NA NA NA
6 2014-08-18 <NA> <NA> <NA> <NA> <NA> <NA> NA NA 0 1 7 2
Cutoff
1 2
2 2
3 2
4 NA
5 NA
6 4
I would like to use LOCF methods described in other posts (Filling in missing (blanks) in a data table, per category - backwards and forwards, R -- Carry last observation forward n times), but I have a slightly different question - it is possible to specify LOCF n
times when n
is variable but specified in a column. In my df, n
is specified in column Cutoff.
In other words, I would like LOCF to fill in forwards for n
days listed in Cutoff.
Any help is appreciated.
Example Before:
4 2014-08-16 14 61 117 187 s1 NA 1 1 0 NA NA NA NA
5 2014-08-17 14a 193 56 173 s2 NA 8 1 0 NA NA NA NA
6 2014-08-18 NA NA NA NA NA NA NA NA 0 1 7 2 4
7 2014-08-19 NA NA NA NA NA NA NA NA 0 2 7 2 4
8 2014-08-20 NA NA NA NA NA NA NA NA 0 3 7 2 4
9 2014-08-21 NA NA NA NA NA NA NA NA 0 4 7 2 4
10 2014-08-22 NA NA NA NA NA NA NA NA 0 5 7 2 4
11 2014-08-23 NA NA NA NA NA NA NA NA 0 6 7 2 4
12 2014-08-24 NA NA NA NA NA NA NA NA 0 7 7 2 4
13 2014-08-25 4 21 80 187 s2 NA 1 1 0 NA NA NA NA
14 2014-08-26 9 1 120 173 s2 NA 1 1 0 NA NA NA NA
15 2014-08-27 13 15 100 187 s2 NA 1 1 0 NA NA NA NA
16 2014-08-28 5 21 17 173 s2 NA 1 1 0 NA NA NA NA
Example After:
4 2014-08-16 14 61 117 187 s1 NA 1 1 0 NA NA NA NA
5 2014-08-17 14a 193 56 173 s2 NA 8 1 0 NA NA NA NA
6 2014-08-18 14a 193 56 173 s2 NA 8 1 0 1 7 2 4
7 2014-08-19 14a 193 56 173 s2 NA 8 1 0 2 7 2 4
8 2014-08-20 14a 193 56 173 s2 NA 8 1 0 3 7 2 4
9 2014-08-21 14a 193 56 173 s2 NA 8 1 0 4 7 2 4
10 2014-08-22 NA NA NA NA NA NA NA NA 0 5 7 2 4
11 2014-08-23 NA NA NA NA NA NA NA NA 0 6 7 2 4
12 2014-08-24 NA NA NA NA NA NA NA NA 0 7 7 2 4
13 2014-08-25 4 21 80 187 s2 NA 1 1 0 NA NA NA NA
14 2014-08-26 9 1 120 173 s2 NA 1 1 0 NA NA NA NA
15 2014-08-27 13 15 100 187 s2 NA 1 1 0 NA NA NA NA
16 2014-08-28 5 21 17 173 s2 NA 1 1 0 NA NA NA NA
Note that the expansion is only applied once. In other words, the first four NA rows in the missing data time period labeled 2
were filled in (rows 6-9), and not 10-12. This is because 10-12 are Day 5,6,7 of the gap time period, not Days 1-4.