5

I'm struggling with making a graph of proportion of a variable across a factor in ggplot.

Taking mtcars data as an example and stealing part of a solution from this question I can come up with

ggplot(mtcars, aes(x = as.factor(cyl))) +  
  geom_bar(aes(y = (..count..)/sum(..count..))) + 
  scale_y_continuous(labels = percent_format())

This graph gives me proportion of each cyl category in the whole dataset.

What I'd like to get though is the proportion of cars in each cyl category, that have automatic transmission (binary variable am).

On top of each bar I would like to add an error bar for the proportion.

Is it possible to do it with ggplot only? Or do I have to first prepare a data frame with summaries and use it with identity option of bar graphs?

I found some examples on Cookbook for R web page, but they deal with continuous y variable.

Community
  • 1
  • 1
radek
  • 7,240
  • 8
  • 58
  • 83

1 Answers1

8

I think that it would be easier to make new data frame and then use it for plotting. Here I calculated proportions and lower/upper confidence interval values (took them from prop.test() result).

library(plyr)
mt.new<-ddply(mtcars,.(cyl),summarise,
      prop=sum(am)/length(am),
      low=prop.test(sum(am),length(am))$conf.int[1],
      upper=prop.test(sum(am),length(am))$conf.int[2])

ggplot(mt.new,aes(as.factor(cyl),y=prop,ymin=low,ymax=upper))+
  geom_bar(stat="identity")+
  geom_errorbar()
Didzis Elferts
  • 95,661
  • 14
  • 264
  • 201