The zoo
package and its na.locf()
function could help you as described by Dirk Eddelbuettel in here: Replacing NAs with latest non-NA value.
library(data.table)
library(zoo)
DT <- data.table(a = c(0, NA, NA, 0, NA, 1, 1, NA, 0, NA, 1, NA, NA, NA, 0, 1, 1, 0, NA, 0))
non_nas <- DT[!is.na(a), a]
successor <- c(non_nas[-1], 0)
diff <- abs(non_nas - successor)
DT[!is.na(a), diff:=diff]
This will give you a data table as follows:
a diff
1: 0 0
2: NA NA
3: NA NA
4: 0 1
5: NA NA
6: 1 0
7: 1 1
8: NA NA
9: 0 1
10: NA NA
11: 1 1
12: NA NA
13: NA NA
14: NA NA
15: 0 1
16: 1 0
17: 1 1
18: 0 0
19: NA NA
20: 0 0
The idea here is that every '1' in the diff column tells you that the value in 'a' is going to change after the NAs below.
Now you want to get rid of the NAs in the 'diff' column. For clarity, we put the result into the new column 'b'. This is where the zoo
package comes into play:
DT[, b:=na.locf(diff)]
This results in
a diff b
1: 0 0 0
2: NA NA 0
3: NA NA 0
4: 0 1 1
5: NA NA 1
6: 1 0 0
7: 1 1 1
8: NA NA 1
9: 0 1 1
10: NA NA 1
11: 1 1 1
12: NA NA 1
13: NA NA 1
14: NA NA 1
15: 0 1 1
16: 1 0 0
17: 1 1 1
18: 0 0 0
19: NA NA 0
20: 0 0 0
Eventually
DT[is.na(a) & b == 1, which = TRUE]
will give you:
[1] 5 8 10 12 13 14