45

Very basic question here as I'm just starting to use R, but I'm trying to create a bar plot of factor counts in ggplot2 and when plotting, get 14 little colored blips representing my actual levels and then a massive grey bar at the end representing the 5000-ish NAs in the sample (it's survey data from a question that only applies to about 5% of the sample). I've tried the following code to no avail:

ggplot(data = MyData,aes(x= the_variable, fill=the_variable, na.rm = TRUE)) + 
   geom_bar(stat="bin") 

The addition of the na.rm argument here has no apparent effect.

meanwhile

ggplot(data = na.omit(MyData),aes(x= the_variable, fill=the_variable, na.rm = TRUE)) + 
   geom_bar(stat="bin") 

gives me

"Error: Aesthetics must either be length one, or the same length as the data"

as does affixing the na.omit() to the_variable, or both MyData and the_variable.

All I want to do is eliminate the giant NA bar from my graph, can someone please help me do this?

joran
  • 169,992
  • 32
  • 429
  • 468
Ben Eichler
  • 451
  • 1
  • 4
  • 3
  • 2
    It's really impossible to help without having your data. You need to provide a [small example](http://stackoverflow.com/q/5963269/324364) that we can actually run, so we are able to look at your actual data structure. – joran Jun 20 '13 at 14:33
  • 5
    Without seeing your data, you may be able to subset down to just the non-NA values for plotting purposes. Ie `MyData.sub <- MyData[!is.na(MyData)]`, then just plot the subset. I often do something similar to remove zeros. – dayne Jun 20 '13 at 14:35
  • Would it work to just refactor your fill variable? `fill = factor(the_variable)` – Fr. Jun 20 '13 at 14:59

7 Answers7

56

You can use the function subset inside ggplot2. Try this

library(ggplot2)

data("iris")
iris$Sepal.Length[5:10] <- NA # create some NAs for this example

ggplot(data=subset(iris, !is.na(Sepal.Length)), aes(x=Sepal.Length)) + 
geom_bar(stat="bin")
rafa.pereira
  • 13,251
  • 6
  • 71
  • 109
  • 4
    Unfortunately, `iris` has no NAs .) – ikashnitsky Dec 01 '17 at 21:42
  • 2
    Ha! That's a nice way to treat the comment)) I guess, for almost any case there is a well suited dataset [from the R built-in ones](https://vincentarelbundock.github.io/Rdatasets/datasets.html) – ikashnitsky Dec 02 '17 at 13:16
  • @ikashnitsky Thanks for that table. A `hasNAs` column would have been very helpful though :) – BroVic Jun 24 '18 at 08:45
  • @mad If you are creating a plot with two columns, make sure to remove the `NA` value in both of them. Example : `subset(iris, !is.na(Sepal.Length & Sepal.Width))` – rafa.pereira Aug 09 '18 at 17:13
  • That's a great way to deal with ```NA```s within the ```ggplot()```. Thanks @rafa.pereira – Sandy Jun 23 '21 at 17:27
30

Just an update to the answer of @rafa.pereira. Since ggplot2 is part of tidyverse, it makes sense to use the convenient tidyverse functions to get rid of NAs.

library(tidyverse)
airquality %>% 
        drop_na(Ozone) %>%
        ggplot(aes(x = Ozone))+
        geom_bar(stat="bin")

Note that you can also use drop_na() without columns specification; then all the rows with NAs in any column will be removed.

ikashnitsky
  • 2,941
  • 1
  • 25
  • 43
  • I like this approach because it addresses the problem before it ever manifests into an actual problem; simply remove the `NA` values from the onset and you needn't worry about them any more. – Mus Feb 15 '21 at 12:50
26

Additionally, adding na.rm= TRUE to your geom_bar() will work.

ggplot(data = MyData,aes(x= the_variable, fill=the_variable, na.rm = TRUE)) + 
   geom_bar(stat="bin", na.rm = TRUE)

I ran into this issue with a loop in a time series and this fixed it. The missing data is removed and the results are otherwise uneffected.

regents
  • 600
  • 6
  • 15
12

Not sure if you have solved the problem. For this issue, you can use the "filter" function in the dplyr package. The idea is to filter the observations/rows whose values of the variable of your interest is not NA. Next, you make the graph with these filtered observations. You can find my codes below, and note that all the name of the data frame and variable is copied from the prompt of your question. Also, I assume you know the pipe operators.

library(tidyverse) 

MyDate %>%
   filter(!is.na(the_variable)) %>%
     ggplot(aes(x= the_variable, fill=the_variable)) + 
        geom_bar(stat="bin") 

You should be able to remove the annoying NAs on your plot. Hope this works :)

Jay Chieh Kao
  • 121
  • 1
  • 3
12

Try remove_missing instead with vars = the_variable. It is very important that you set the vars argument, otherwise remove_missing will remove all rows that contain an NA in any column!! Setting na.rm = TRUE will suppress the warning message.

ggplot(data = remove_missing(MyData, na.rm = TRUE, vars = the_variable),aes(x= the_variable, fill=the_variable, na.rm = TRUE)) + 
       geom_bar(stat="bin") 
Bryan F
  • 830
  • 8
  • 14
0

From my point of view this error "Error: Aesthetics must either be length one, or the same length as the data" refers to the argument aes(x,y) I tried the na.omit() and worked just fine to me.

0

Another option is using the function complete.cases like this:

library(ggplot2)
# With NA
ggplot(airquality, aes(x = Ozone))+
  geom_bar(stat="bin")
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> Warning: Removed 37 rows containing non-finite values (stat_bin).

# Remove NA using complete.cases
airquality_complete=airquality[complete.cases(airquality), ]
ggplot(airquality_complete, aes(x = Ozone))+
  geom_bar(stat="bin")
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Created on 2022-08-25 with reprex v2.0.2

Quinten
  • 35,235
  • 5
  • 20
  • 53