20

I'm trying to plot a v. simple boxplot in ggplot2. I have species richness vs. landuse class. However, I have 2 NA's in my data. For some strange reason, they're being plotted, even when they're being understood as NA's by R. Any suggestion to remove them?

The code I'm using is:

ggplot(data, aes(x=luse, y=rich))+
  geom_boxplot(mapping = NULL, data = NULL, stat = "boxplot", position = "dodge", outlier.colour = "red", outlier.shape = 16, outlier.size = 2, notch = F, notchwidth = 0.5)+
  scale_x_discrete("luse", drop=T)+
  geom_smooth(method="loess",aes(group=1))

However, the graph includes 2 NA's for luse. Unfortunately I cannot post images, but imagine that a NA bar is being added to my graph.

Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
R. Solar
  • 201
  • 1
  • 2
  • 3
  • 19
    `ggplot(na.omit(data), aes(x=luse, y=rich)) + ...` – Roland Jun 17 '13 at 11:23
  • 24
    For a more general case: if the data contain variables other than the two being plotted, `na.omit(data)` will remove observations with missings on any variable. This can have unintended consequences for your graphs and/or analysis. One could use `data=na.omit(data[,c("var1","var2",...)])`, where var1, var2, ... are the variables you require for your graph. – Maxim.K Mar 24 '14 at 10:41
  • 4
    +1 for @Maxim.K, I ran into this exact problem with a large data frame in which one of the variables had an extremely high proportion of NA values. I couldn't quite workout the syntax to just get rid of the NA in my variable of interest. But note, if you are only interested in one variable, like I was, the code above returns a vector, you must select at least 2 columns in the data.frame to make it work as it is written. – svannoy May 29 '16 at 00:10

3 Answers3

10

You may try to use the subset() function in the first line of your code

ggplot(data=subset(data, !is.na(luse)), aes(x=luse, y=rich))+

as suggested in: Eliminating NAs from a ggplot

Uwe
  • 41,420
  • 11
  • 90
  • 134
3

Here is a formal answer using the comments above to incorporate !is.na() with filter() from tidyverse/dplyr. If you have a basic tidyverse operation such as filtering NAs, you can do it right in the ggplot call, as suggested, to avoid making a new data frame:

ggplot(data %>% filter(!is.na(luse)), aes(x = luse, y = rich)) + geom_boxplot()

user29609
  • 1,991
  • 18
  • 22
0

You can also use the filter() function in dplyr/tidyverse:

data %>% filter(is.na(luse) == FALSE) %>% 
   ggplot(aes(x=luse, y=rich)) +
   geom_boxplot()

This way you don't have to create a new object.

  • did you maybe mean `! is.na()` instead ? Or do you want all the NA's? ;) Also, you do not necessarily need to specify `is.na (x) == TRUE` , because it evaluates to a logical vector anyways which will then be used by `filter()` .... P.S. welcome to SOF – tjebo Feb 28 '18 at 01:37
  • Oh, yep. Typo, sorry. Thanks for catching that. Also, cool. I did not know you could just cast `is.na()` directly. – Luke McDonald Mar 01 '18 at 03:15