I have a large R data.table
with a multi column key, where some value columns contain some NA. I'd like to remove groups that are entirely NA in one or more value columns, but keep the whole group otherwise. Repeating this for each column of the key.
To give a simplified example:
library(data.table)
DT = data.table(
Series = rep(letters[1:12], each = 3),
Id = 1:12,
Value1 = c(1:3, NA, 5:9, rep(NA,3), 1:3, NA, 5:9, rep(NA,3), 1:3, NA, 5:9, rep(NA,3)),
Value2 = c(rep(NA,3), 1:4, NA, 6:9, rep(NA,3), 1:9, 1:9, rep(NA,3)))
DT
Series Id Value1 Value2
1: a 1 1 NA
2: a 2 2 NA
3: a 3 3 NA
4: b 4 NA 1
5: b 5 5 2
6: b 6 6 3
7: c 7 7 4
8: c 8 8 NA
9: c 9 9 6
10: d 10 NA 7
11: d 11 NA 8
12: d 12 NA 9
13: e 1 1 NA
14: e 2 2 NA
15: e 3 3 NA
16: f 4 NA 1
17: f 5 5 2
18: f 6 6 3
19: g 7 7 4
20: g 8 8 5
21: g 9 9 6
22: h 10 NA 7
23: h 11 NA 8
24: h 12 NA 9
25: i 1 1 1
26: i 2 2 2
27: i 3 3 3
28: j 4 NA 4
29: j 5 5 5
30: j 6 6 6
31: k 7 7 7
32: k 8 8 8
33: k 9 9 9
34: l 10 NA NA
35: l 11 NA NA
36: l 12 NA NA
Series Id Value1 Value2
So I would like to drop:
- Series: a,d,e,h and l
- Ids: 4, 10,11 and 12
Correct result should look like:
Series Id Value1 Value2
1: b 5 5 2
2: b 6 6 3
3: c 7 7 4
4: c 8 8 NA
5: c 9 9 6
6: f 5 5 2
7: f 6 6 3
8: g 7 7 4
9: g 8 8 5
10: g 9 9 6
11: i 1 1 1
12: i 2 2 2
13: i 3 3 3
14: j 5 5 5
15: j 6 6 6
16: k 7 7 7
17: k 8 8 8
18: k 9 9 9
Series Id Value1 Value2
What I managed so far:
I can find the Series that are NA for Value1 like this:
DT[, sum(1-is.na(Value1)) == 0, by = Series][V1 == TRUE]
And I could even do
setkey(DT, Series)
DT = DT[DT[, sum(1-is.na(Value)) == 0, by = Series][V1 != TRUE]]
But now I end up with V1 appearing in the final table.