I would like to delete rows where column a has duplicate values based on the number of NAs in other columns. This is similar to this, but I can't get the counting NAs to work with the solutions there.
Deleting rows that are duplicated in one column based on the conditions of another column
Here is my toy dataset:
df1 <- data.frame(a = c("x","y","y","z","x", "z"), b = c(1,2,NA,4,8,3), c = c(NA,2,2,NA,NA,4), d= c(1:4,NA,NA))
which gives:
a b c d
1 x 1 NA 1
2 y 2 2 2
3 y NA 2 3
4 z 4 NA 4
5 x 8 NA NA
6 z 3 4 NA
I would like to only keep rows with unique values in column a, and keep only the rows with the least number of NAs in cols b & c (ignoring NAs in column d)
This is the code I came up with:
df1 %>%
mutate(NAs= apply(is.na(cbind(b,c)), 1, sum)) %>%
group_by(a) %>%
top_n(n=1, -NAs)
My problem is that top_n returns more than one row if there is a tie. In case of a tie, I just want the first row returned. And there's probably a better way to select columns in mutate than cbind. I also don't need the "NAs" variable i created using mutate. My desired output is this:
a b c d
x 1 NA 1
y 2 2 2
z 3 4 NA