1

I'm trying to clean up a dataset and encountering a problem, 3 colums are related so before i can continue to do some clean up, I'd like to remove the NAs ONLY when they all 3 have them.

example

      A.   B.   C.    
1.-   NA.  NA.  NA. <- to be removed
2.-   NA.  NA.  10. <- not to be removed
3.-   NA.  29.  NA  <- not to be removed
4.-   NA.  NA.  NA. <  to be removed

I have tried so far with:

subset(data, data$A == NA & data$B == NA & data$C == NA)

data_new <- data[complete.cases(data$A) & (data$B) & (data$C), ]

but nothing seems to work.

Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
Xams
  • 57
  • 6

2 Answers2

2

The complete.cases code can be with | condition as complete.cases returns TRUE for a non-NA value and FALSE for NA. Thus, by using the OR, we are subsetting a row having at least one non-NA

data[complete.cases(data$A) | complete.cases(data$B) | complete.cases(data$C),]

Or more easily with rowSums

data[rowSums(is.na(data[, c("A", "B", "C")])) < 3,]

Or with dplyr with if_all or if_any

library(dplyr)
data %>% 
  filter(!if_all(c(A, B, C), is.na))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

R does not use the == operator on NA, use is.na() function instead. See in R does NA == NA?

Data

DF<-data.frame(ID=c(1:4),
               A=c(NA, NA, NA, NA),
               B=c(NA, NA, 29, NA),
               C=c(NA, 10, NA, NA))

Base solution

DF[!(is.na(DF$A) & is.na(DF$B) & is.na(DF$C)),]
  ID  A  B  C
2  2 NA NA 10
3  3 NA 29 NA

In your original subset() attempt, subset does not need to repeat the data.frame when identifying the column in the filter section (eg DF$A vs A).

subset(DF, !(is.na(A) & is.na(B) & is.na(C)))
#or
subset(DF, !is.na(A) | !is.na(B) | !is.na(C))
M.Viking
  • 5,067
  • 4
  • 17
  • 33