1

I am trying to plot a factor variable as a bar plot using ggplot2.

I create the variable like this:

survey$coffeeblack2[survey$coffee == 1] <- 2
survey$coffeeblack2[survey$coffee == 2] <- 1
survey$coffeeblack2[survey$coffee == 3] <- 1
survey$coffeeblack2[survey$coffee == 4] <- 1
survey$coffeeblack2[survey$coffee == 5] <- 0
survey$coffeeblack2[survey$coffee == 6] <- 1
survey$coffeeblack2[survey$coffee == "NA"] <- NA

survey$coffeeblack2 <- as.factor(survey$coffeeblack2)
summary(survey$coffeeblack2)

This summary command gives the following, correct, output:

0    1    2     NA's 
139  186  107    4

I use the following command to plot it:

ggplot(survey, aes(coffeeblack2)) + 
  geom_bar( aes(fill=..count..)) + 
  scale_fill_gradient("Count", low="green", high ="red") + 
  scale_x_discrete(labels = c("0" = "Non-Drinker", "1" = "Adder", "2" = "Black", "NA" = "NA"))

It gives the following output:

enter image description here

The NA's plotted but are labelled "Non-Drinker." I figured out how to remove them from the graph, but how do I get them correctly labelled as NA?

(I also removed the

, "NA" = "NA"

and got the same result)


Updated with minimal working example:

library(ggplot2)
a <- c(1,2,2,3,3,3,NA,NA)
a.f <- as.factor(a)
summary(a.f)  

ggplot(as.data.frame(a.f), aes(a.f)) + geom_bar( aes(fill=..count..)) +  scale_x_discrete(labels = c("1" = "One", "2" = "Two", "3" = "Three", "NA" = "NA")) 

Example two

The example shows it says "One" when plotting the NAs

Community
  • 1
  • 1
Andy
  • 53
  • 1
  • 5
  • Welcome to StackOverflow! Please read the info about how to give a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). This will make it much easier for others to help you. – Jaap Oct 20 '15 at 16:41
  • You can't `use x == 'NA'` to find missing values. You'll need `is.na(x)` to identify the missing values. – Minnow Oct 20 '15 at 16:45

2 Answers2

0

Set the factor labels outside of ggplot:

survey$coffeeblack2 <- as.factor(survey$coffeeblack2
    levels=c(0,1 ,2 ), # the values found in the first argument
    labels=c("Non-Drinker",  "Adder",  "Black")) # the labels to apply to those values

and let ggplot use the labels that are present in your data:

ggplot(survey, aes(coffeeblack2)) + 
  geom_bar( aes(fill=..count..)) + 
  scale_fill_gradient("Count", low="green", high ="red")

this is one of the specific purposes of the grammar of graphics -- to separate data manipulations (matching labels to values) from the mapping of data features to plot features, (i.e. data labels to axis labels).

Jthorpe
  • 9,756
  • 2
  • 49
  • 64
0
#some data
DF <- iris
DF[8:10, "Species"] <- NA
DF$Species <- as.character(as.integer(DF$Species))

Set breaks and labels like this:

library(ggplot2)
summary(DF$Species)
ggplot(DF, aes(x = Species)) + 
  geom_bar( aes(fill=..count..)) + 
  scale_fill_gradient("Count", low="green", high ="red") + 
  scale_x_discrete(breaks = c("1", "2", "3", NA),
                   labels = c("Non-Drinker", "Adder", "Black", "NA"))

resulting plot

Roland
  • 127,288
  • 10
  • 191
  • 288