1

I am looking for a solution similar to the one provided here: How to fill NAs with LOCF by factors in data frame, split by country, but I wish to have a modification to that solution that incorporates a specific number of days to fill forward (i.e. last value in Cutoff column).

I have a large dataframe with missing days (here is the df for one individual, I have 60+)

'data.frame':   387 obs. of  14 variables:
 $ Date             : POSIXct, format: "2014-08-13" "2014-08-14" "2014-08-15" "2014-08-16" ...
 $ MEID.1           : chr  NA NA NA "14" ...
 $ MEID.2           : Factor w/ 184 levels "1","100","100.1",..: NA NA NA 143 48 NA NA NA NA NA ...
 $ MEID.3           : Factor w/ 180 levels "100","100.1",..: NA NA NA 24 134 NA NA NA NA NA ...
 $ MEID.4           : Factor w/ 42 levels "173","173a","173b",..: NA NA NA 17 1 NA NA NA NA NA ...
 $ MEID.5           : Factor w/ 3 levels "d1","s1","s2": NA NA NA 2 3 NA NA NA NA NA ...
 $ MEID.6           : Factor w/ 1 level "s2": NA NA NA NA NA NA NA NA NA NA ...
 $ DAYT             : int  NA NA NA 1 8 NA NA NA NA NA ...
 $ DATT             : int  NA NA NA 1 1 NA NA NA NA NA ...
 $ Reason.For.Change: chr  "0" "0" "0" "0" ...
 $ GAP_Days         : chr  "1" "2" "3" "NA" ...
 $ Max              : chr  "3" "3" "3" "NA" ...
 $ GAP_LABEL_FINAL  : chr  "1" "1" "1" "NA" ...
 $ Cutoff           : num  2 2 2 NA NA 4 4 4 4 4 ...

Here is the result of head()

> head(final_daily_intake3)
        Date MEID.1 MEID.2 MEID.3 MEID.4 MEID.5 MEID.6 DAYT DATT Reason.For.Change GAP_Days Max GAP_LABEL_FINAL
1 2014-08-13   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   NA   NA                 0        1   3               1
2 2014-08-14   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   NA   NA                 0        2   3               1
3 2014-08-15   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   NA   NA                 0        3   3               1
4 2014-08-16     14     61    117    187     s1   <NA>    1    1                 0       NA  NA              NA
5 2014-08-17    14a    193     56    173     s2   <NA>    8    1                 0       NA  NA              NA
6 2014-08-18   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>   NA   NA                 0        1   7               2
  Cutoff
1      2
2      2
3      2
4     NA
5     NA
6      4

I would like to use LOCF methods described in other posts (Filling in missing (blanks) in a data table, per category - backwards and forwards, R -- Carry last observation forward n times), but I have a slightly different question - it is possible to specify LOCF n times when n is variable but specified in a column. In my df, n is specified in column Cutoff.

In other words, I would like LOCF to fill in forwards for n days listed in Cutoff.

Any help is appreciated.

Example Before:

4   2014-08-16  14  61  117 187 s1  NA  1   1   0   NA  NA  NA  NA
5   2014-08-17  14a 193 56  173 s2  NA  8   1   0   NA  NA  NA  NA
6   2014-08-18  NA  NA  NA  NA  NA  NA  NA  NA  0   1   7   2   4
7   2014-08-19  NA  NA  NA  NA  NA  NA  NA  NA  0   2   7   2   4
8   2014-08-20  NA  NA  NA  NA  NA  NA  NA  NA  0   3   7   2   4
9   2014-08-21  NA  NA  NA  NA  NA  NA  NA  NA  0   4   7   2   4
10  2014-08-22  NA  NA  NA  NA  NA  NA  NA  NA  0   5   7   2   4
11  2014-08-23  NA  NA  NA  NA  NA  NA  NA  NA  0   6   7   2   4
12  2014-08-24  NA  NA  NA  NA  NA  NA  NA  NA  0   7   7   2   4
13  2014-08-25  4   21  80  187 s2  NA  1   1   0   NA  NA  NA  NA
14  2014-08-26  9   1   120 173 s2  NA  1   1   0   NA  NA  NA  NA
15  2014-08-27  13  15  100 187 s2  NA  1   1   0   NA  NA  NA  NA
16  2014-08-28  5   21  17  173 s2  NA  1   1   0   NA  NA  NA  NA

Example After:

4   2014-08-16  14  61  117 187 s1  NA  1   1   0   NA  NA  NA  NA
5   2014-08-17  14a 193 56  173 s2  NA  8   1   0   NA  NA  NA  NA
6   2014-08-18  14a 193 56  173 s2  NA  8   1   0   1   7   2   4
7   2014-08-19  14a 193 56  173 s2  NA  8   1   0   2   7   2   4
8   2014-08-20  14a 193 56  173 s2  NA  8   1   0   3   7   2   4
9   2014-08-21  14a 193 56  173 s2  NA  8   1   0   4   7   2   4
10  2014-08-22  NA  NA  NA  NA  NA  NA  NA  NA  0   5   7   2   4
11  2014-08-23  NA  NA  NA  NA  NA  NA  NA  NA  0   6   7   2   4
12  2014-08-24  NA  NA  NA  NA  NA  NA  NA  NA  0   7   7   2   4
13  2014-08-25  4   21  80  187 s2  NA  1   1   0   NA  NA  NA  NA
14  2014-08-26  9   1   120 173 s2  NA  1   1   0   NA  NA  NA  NA
15  2014-08-27  13  15  100 187 s2  NA  1   1   0   NA  NA  NA  NA
16  2014-08-28  5   21  17  173 s2  NA  1   1   0   NA  NA  NA  NA

Note that the expansion is only applied once. In other words, the first four NA rows in the missing data time period labeled 2 were filled in (rows 6-9), and not 10-12. This is because 10-12 are Day 5,6,7 of the gap time period, not Days 1-4.

Community
  • 1
  • 1
user3795577
  • 187
  • 1
  • 18

0 Answers0