6

I have a problem with NA in a factor variable since ggplot includes them in the plot as if they are another category/level. I would like to drop the missing data. I am sorry I don't have code handy at the moment, I tried to remove factor levels from dataset that I found at data() and it did not work.

Had someone the same problem?

I tried the solution suggested here Remove unused factor levels from a ggplot bar plot but I get an error

Error: unexpected symbol in: mycode

Can someone suggest something?

Also, if there is no way to remove them from inside the ggplot code, how can I remove the NA from a factor variable?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Pulse
  • 867
  • 5
  • 12
  • 19

3 Answers3

6

assuming your data is in a data frame called dat

newdat <- dat[!is.na(dat$Factor), ]

not sure how to solve the problem inside of ggplot code

Jota
  • 17,281
  • 7
  • 63
  • 93
  • thank you @Frank , it does work, a question: does it remove the cases with NA in that factor variable across the dataset? – Pulse Jul 02 '13 at 00:48
  • 1
    The command is saying to remove the entire row every time `is.na(dat$Factor)` is `TRUE` – Jota Jul 02 '13 at 00:50
  • @baptiste I changed the code to reflect your comment. I wasn't aware that it worked that way. Thanks for the comment. – Jota Jul 02 '13 at 00:55
  • @Frank, I am sorry I do not understand completely, it removes entire rows every time `is.na(dat$Factor)` is `TRUE`, you mean that I need to call this iside ggplot function, or any other function? – Pulse Jul 05 '13 at 01:17
  • You could do that or you could create a new object (e.g. `newdat`) to contain your subset of the original data. Then, use the new object inside your ggplot function. – Jota Jul 05 '13 at 02:20
2

I'd use qplot instead of ggplot in this way:

qplot(x=column, data=subset(dataframe,!is.na(column)))

I hope this helps.

Zeeshan
  • 2,884
  • 3
  • 28
  • 47
Giacomo
  • 1,796
  • 1
  • 24
  • 35
2

Answers on this related thread: NA's are being plotted in boxplot ggplot2

In brief, instead of the usual:

ggplot(data=data)

use

ggplot(data=na.omit(data[,c("var1","var2",...)])) 

where var1, var2 etc are the variables you are plotting.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Lynsey
  • 339
  • 1
  • 2
  • 11
  • Simple and clean solution. Assuming there is no differerence between `na.omit()` and `drop_na()`, the latter would be idiomatic Tidyverse, as in: `ggplot( data = drop_na( data[ , c("var1" ,"var2" , ... ) ] ) )` – Clokman Mar 22 '23 at 04:26