1

I'm new to R so please excuse my very basic question:

I have a data frame that has a lot of missing data. I've used na.omit to remove missing data as in:

data2 <- na.omit(data1)

Howevever, some of the variables are factors that still seem to have "" as one of the categorise, as in:

> str(data2$smoker)
 Factor w/ 3 levels "","No","Yes": 2 2 2 2 2 2 2 3 3 2 ...

When I look at "data2" it does still have missing values. What I am doing wrong?

Help and advice much appreciated.

Greg

Greg Martin
  • 243
  • 3
  • 5
  • 17
  • Unless you tell R to do so when reading in your file, `""` is not considered missing. How did you read in your data? – Heroka Oct 05 '15 at 15:19
  • By default, `""` is not a missing value. Missing values are declared by the logical constant `NA` (see help `?NA`) or, in case of a character value, by `NA_character_`. You could e.g. try `na.omit(factor(data2$smoker, levels = c("No", "Yes")))`. – lukeA Oct 05 '15 at 15:21
  • @Heroka - thanks - I simply did a read.csv – Greg Martin Oct 05 '15 at 16:01
  • @lucaA - thanks - that seems to have helped a lot – Greg Martin Oct 05 '15 at 16:01
  • @lukeA I seem to have a new problem now. If I try to create a table using one of the variables for which I have now removed all of the "" values, I get an error that states "all arguments must have the same length". Is there a way to remove all of the rows for which the "" was removed from a particular variable? – Greg Martin Oct 05 '15 at 16:33
  • @drgregmartin yes. E.g. `df <- data.frame(a = 1:3, b = factor(c("", "yes", "no"))); df[df == ""] <- NA; na.omit(df); na_rows <- is.na(df$b); df[!na_rows, ]`. BTW: You should edit your question and provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – lukeA Oct 05 '15 at 17:18

1 Answers1

0

NA is not the same as "".

What is the difference?

  • NA indicates a missing value
  • "" is an empty string, which is a type of value

na.omit will remove NA values, but it will not remove empty strings.

I suggest turning "" into NA before using na.omit:

data1[data1$smoker == "", "smoker"] <- NA
sdgfsdh
  • 33,689
  • 26
  • 132
  • 245