6

How can I plot the relative proportions of two groups using a fill aesthetic in ggplot2?

I am asking this question here because several other answers on this topic seem incorrect (ex1, ex2, and ex3), but Cross Validated seems to have functionally banned R specific questions (CV meta). ..density.. is conceptually related to, but distinct from proportions (ex4 and ex5). So the correct answer does not seem to involve density.

Example:

set.seed(1200)
test <- data.frame(
  test1 = factor(sample(letters[1:2], 100, replace = TRUE,prob=c(.25,.75)),ordered=TRUE,levels=letters[1:2]), 
  test2 = factor(sample(letters[3:8], 100, replace = TRUE),ordered=TRUE,levels=letters[3:8])
)
ggplot(test, aes(test2)) + geom_bar(aes(y = ..density.., group=test1, fill=test1) ,position="dodge")
#For example, the plotted data shows level a x c as being slightly in excess of .15, but a manual calculation shows a value of .138
counts <- with(test,table(test1,test2))
counts/matrix(rowSums(counts),nrow=2,ncol=6)

The answer that seems to yield an output that is correct resorts to a solution that doesn't use ggplot2 (calculating it outside of ggplot2) or requires that a panel be used rather than a fill aesthetic.

Edit: Digging into stat_bin yields that the function ultimately called is bin, but bin only gets passed the values in the x aes. Without rewriting stat_bin (or making another stat_) the hack that was applied in the above referenced answer can be generalized to the fill aes in the absence of the group aes with the following code for the y aes: y = ..count../sapply(fill, FUN=function(x) sum(count[fill == x])). This just replaces PANEL (the hidden column that is present at the end of StatBin) with fill). Presumably other hidden variables could get the same treatment.

Community
  • 1
  • 1
russellpierce
  • 4,583
  • 2
  • 32
  • 44
  • How does this generalize to cases where I have, say, a plot split into groups and then faceted into panels? – RoyalTS Mar 05 '14 at 15:44
  • @RoyalTS: I believe the same problem applies because the issue is that there isn't (or wasn't at least) an appropriate stat_ function in ggplot2. I wrote a draft solution that works as a drop-in with ggplot2... but I'm not sure how solid it is, so I didn't post it. – russellpierce Mar 10 '14 at 18:54

1 Answers1

5

This is an aweful hack, but it seems to do what you want...

ggplot(test, aes(test2)) + geom_bar(aes(y = ..count../rep(c(sum(..count..[1:6]), sum(..count..[7:12])), each=6), 
                                    group=test1, fill=test1) ,position="dodge") + 
                                      scale_y_continuous(name="proportion")
  • 6
    +1 even though it is an aweful hack. How did you determine the underlying data structure of ..count.. in order to come up with this? Knowing that is key to coming up with anything that looks like a general solution. – russellpierce Jul 15 '13 at 17:51