Consider the following dataset:
dat<-data.frame(id = c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3),
var1 = c("A","NA","B","A","NA","NA","B","A","NA","NA","NA","C","A","NA","B","A","NA","NA","D","A","NA","NA","B"))
dat
First, I need to fill in all NA's with the values on either side of the NA's, which was successful in dplyr with:
mutate(value = ifelse(is.na(value), paste0(na.locf(value), "-", na.locf(value, fromLast=TRUE)),
value))
This resulted in:
id var1
1 1 A
2 1 A-B
3 1 B
4 1 A
5 1 A-B
6 1 A-B
7 1 B
8 2 A
9 2 A-C
10 2 A-C
11 2 A-C
12 2 C
13 2 A
14 2 A-B
15 2 B
16 3 A
17 3 A-D
18 3 A-D
19 3 D
20 3 A
21 3 A-B
22 3 A-B
23 3 B
However, I now need to keep some values while return others back to NA, based on the consecutive run lengths of the repeats (grouped by id col). If the consecutive repeat of A-B is longer than 1 then return all values to NA, and if the consecutive repeat of A-C is longer than 2 then return all values to NA, and if the consecutive repeat of A-D is longer than 3 then return all values to NA.
My desired outcome is:
id var1
1 1 A
2 1 A-B
3 1 B
4 1 A
5 1 NA
6 1 NA
7 1 B
8 2 A
9 2 NA
10 2 NA
11 2 NA
12 2 C
13 2 A
14 2 A-B
15 2 B
16 3 A
17 3 A-D
18 3 A-D
19 3 D
20 3 A
21 3 NA
22 3 NA
23 3 B
I think this could be accomplished by some combo of group_by(id)
, then rle()
or data.table's rleid()
, then conditionally turn the values back to NA based on the value and the run length with case_when (I considered ifelse()
but I have many more conditions than provided in the example and have read that case_when would be a better choice), but I have not been able to figure out how to write the precise code to do this. A similar question I've come across is Replace NA with previous value with limit however, it is a much simpler version of what I need to do.
Any suggestions would be much appreciated. I feel like I am close but I need assistance to get me to the desired result.