2

I am a grad student using R and have been reading the other Stack Overflow answers regarding removing rows that contain NA from dataframes. I have tried both na.omit and complete.cases. When using both it shows that the rows with NA have been removed, but when I write summary(data.frame) it still includes the NAs. Are the rows with NA actually removed or am I doing this wrong?

na.omit(Perios)
summary(Perios)

Perios[complete.cases(Perios),]
summary(Perios)
Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
Sam
  • 21
  • 1
  • 1
  • 2
  • Figured it out! Periosna=Perios[is.na(Perios$Periostitis)==FALSE,] summary(Periosna) – Sam Feb 27 '14 at 21:36
  • 1
    If you feel that this might be helpful to future visitors, please post an answer with your solution, and accept it. Otherwise, please delete this question. – Bernhard Barker Mar 10 '14 at 14:21
  • OP, what Dukeling said. Either post answer or delete please. How is `complete.cases` different to `Perios[!is.na(Perios$Periostitis),]` , I'm curious? – smci Mar 30 '14 at 09:51
  • 1
    @smci It's the same if and only if there are no other columns containing `NA` values. – Matthew Lundberg Mar 31 '14 at 14:05

2 Answers2

2
  1. The error is that you actually didn't assign the output from na.omit !

    Perios <- na.omit(Perios)

  2. If you know which column the NAs occur in, then you can just do

    Perios[!is.na(Perios$Periostitis),]

or more generally:

Perios[!is.na(Perios$colA) & !is.na(Perios$colD) & ... ,]

Then as a general safety tip for R, throw in an na.fail to assert it worked:

na.fail(Perios)  # trust, but verify! Die Paranoia ist gesund.
smci
  • 32,567
  • 20
  • 113
  • 146
1

is.na is not the proper function. You want complete.cases and you want complete.cases which is the equivalent of function(x) apply(is.na(x), 1, all) or na.omit to filter the data:

That is, you want all rows where there are no NA values.

< x <- data.frame(a=c(1,2,NA), b=c(3,NA,NA))
> x
   a  b
1  1  3
2  2 NA
3 NA NA

> x[complete.cases(x),]
  a b
1 1 3

> na.omit(x)
  a b
1 1 3

Then this is assigned back to x to save the data.

complete.cases returns a vector, one element per row of the input data frame. On the other hand, is.na returns a matrix. This is not appropriate for returning complete cases, but can return all non-NA values as a vector:

> is.na(x)
         a     b
[1,] FALSE FALSE
[2,] FALSE  TRUE
[3,]  TRUE  TRUE


> x[!is.na(x)]
[1] 1 2 3
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112