8

If I do the following command

data(mtcars)
ggplot(data=mtcars, aes(cyl))+
  geom_bar(aes(fill=as.factor(gear), y = (..count..)/sum(..count..)), position="dodge") + 
  scale_y_continuous(labels=percent)

I will get enter image description here

However, what I really want to do is have each of the gear levels add up to 100%. So, gear is the subgroup I am looking at, and I want to know the distribution within each group.

I don't want to use facets and I don't want to melt the data either. Is there a way to do this?

vashts85
  • 1,069
  • 3
  • 14
  • 28
  • Here is the same question, ending with a new data frame, it is always a solution with `ggplot2` : http://stackoverflow.com/questions/36087904/combining-position-dodge-and-position-fill-in-ggplot2 – bVa May 03 '16 at 16:41
  • _"I don't want to do useful things that can help solve the problem"_ O_o – hrbrmstr May 03 '16 at 21:29
  • I just want something that is adaptable in a variety of situations so I can quickly plot certain variables against others. I'd like to build a function maybe, but I am having trouble even with `melt` and `facets` now. HALP.. – vashts85 May 03 '16 at 21:31

4 Answers4

26

I was searching for an answer to this exact question. This is what I came up with using the information I pooled together from Stack Overflow and getting familiar (i.e., trial-and-error) with ..x.., ..group.., and ..count.. from the Sebastian Sauer link provided in Simon's answer. It shouldn't require any other packages than ggplot.

library(ggplot2)
ggplot(mtcars, aes(x=as.factor(cyl), fill=as.factor(gear)))+
  geom_bar(aes( y=..count../tapply(..count.., ..x.. ,sum)[..x..]), position="dodge" ) +
  geom_text(aes( y=..count../tapply(..count.., ..x.. ,sum)[..x..], label=scales::percent(..count../tapply(..count.., ..x.. ,sum)[..x..]) ),
            stat="count", position=position_dodge(0.9), vjust=-0.5)+
  ylab('Percent of Cylinder Group, %') +
  scale_y_continuous(labels = scales::percent)

Produces enter image description here

Robin
  • 465
  • 5
  • 11
  • 2
    just as a side note for others which might be interested: if you want to not have the percentages by x but by the fill variable, you can use ..fill.. instead of ..x.. – Benjamin Schlegel Jan 21 '22 at 15:23
8

First of all: Your code is not reproducible for me (not even after including library(ggplot2)). I am not sure if ..count.. is a fancy syntax I am not aware of, but in any case it would be nicer if I would have been able to reproduce right away :-).

Having said that, I think what you are looking for it described in http://docs.ggplot2.org/current/geom_bar.html and applied to your example the code

library(ggplot2)
data(mtcars)
mtcars$gear <- as.factor(mtcars$gear)
ggplot(data=mtcars, aes(cyl))+
  geom_bar(aes(fill=as.factor(gear)), position="fill")

produces

enter image description here

Is this what you are looking for?


Afterthought: Learning melt() or its alternatives is a must. However, melt() from reshape2 is succeeded for most use-cases by gather() from tidyr package.

Make42
  • 12,236
  • 24
  • 79
  • 155
  • I think this is, i have to test it out. It looks like if I changed it to `position="dodge"` then I would be able to see it within categories of `cyl` in a non-stacked format, right? – vashts85 May 04 '16 at 16:45
  • Follow-up: how would you add value labels to each portion with the following code: ggplot(data=mtcars, aes(cyl, y=(..count..)/sum(..count..)))+ geom_bar(aes(fill=as.factor(gear)), position="dodge")+ geom_text(aes(size=18, label = format(paste(round(100*(..count..)/sum(..count..),1), "%",sep=""), digits=1, drop0trailing=TRUE), y= (..count..)/sum(..count..) ), stat= "count") Mine is not working. – vashts85 May 04 '16 at 16:48
  • @vashts85: Firstly, `size=18` in `geom_text` can't be right, secondly, write a new question in which you (a) explain what `..count..`means, and (b) give an image of what you would like to see - I am not able to recognize this from your code. – Make42 May 04 '16 at 17:03
  • I want to get the percentage of `gear` within each level of `cyl`. And then I want to add labels on top of it. In a sense, I am just trying to set up a workflow in R to create the most basic of charts you see in standard PPT presentations in business contexts. – vashts85 May 04 '16 at 17:09
  • @vashts85: "I want to get the percentage of gear within each level of cyl." I think you got that :-). For the rest: Please ask a new question. I am happy to have my go in answering it, if you link to it from here. If its so standard, you may link to an image from the internet where it is shown. (Many business presentation "standards" suck, so I am not really complying to them.) – Make42 May 04 '16 at 17:22
  • Will do. Thank you so much for your help so far! – vashts85 May 04 '16 at 17:23
  • The `..count..` variable is made by `geom_bar` and you can use it directly as in the OP. See [here](http://stackoverflow.com/questions/14570293/special-variables-in-ggplot-count-density-etc) and [here](http://stackoverflow.com/questions/15556069/documentation-for-special-variables-in-ggplot-count-density-etc) for more info. – aosmith May 04 '16 at 17:42
  • OK i've added my new question here: http://stackoverflow.com/questions/37054386/ggplot-function-to-facilitate-basic-graphing – vashts85 May 05 '16 at 15:24
  • The question you linked was deleted so I answered your question below. – Robin Mar 29 '18 at 23:02
  • is there anyway you can do the fill color of this over a continuous variable? – Skyler Aug 18 '19 at 01:30
4

Here's a good resource on how to do this from Sebastian Sauer. The quickest way to solve your problem is Way 4 in which you substitude ..prop.. for (..count..)/sum(..count):

# Dropping scale_y_continuous, since you do not define percent
ggplot(data=mtcars, aes(cyl))+
  geom_bar(aes(fill=as.factor(gear), y = (..count..)/sum(..count..)), 
position="dodge")

Another approach, which I use and is similar to Way 1 in the linked page, is to use dplyr to calculate the percentages and stat = 'identity' to use the y aesthetic in a bar graph:

mtcars %>%
  mutate(gear = factor(gear)) %>%
  group_by(gear, cyl) %>%
  count() %>%
  group_by(gear) %>%
  mutate(percentage = n/sum(n)) %>%
  ggplot(aes(x = cyl, y = percentage, fill = gear)) +
    geom_bar(position = 'dodge', stat = 'identity')
  • I believe this should be the accepted solution, computing percentages across two categorical variables within `geom_bar` is cumbersome, it is much easier to do this with `dplyr` functions then move on to plotting. – Xavier GB Oct 09 '21 at 03:04
1

If I understand the question of wanting to make each gear sum to 100% (rather than cyl summing to 100%), I made a small tweak to Robin's resonse to make this work.

Basically in the aes() statements, change ..x.. to ..fill..

ggplot(mtcars, aes(x=as.factor(cyl), fill=as.factor(gear)))+
  geom_bar(aes(y=..count../tapply(..count.., ..fill.. ,sum)[..fill..]), position="dodge") +
  geom_text(aes(y=..count../tapply(..count.., ..fill.. ,sum)[..fill..], 
                label=scales::percent(..count../tapply(..count.., ..fill.. ,sum)[..fill..])),
            stat="count", position=position_dodge(0.9), vjust=-0.5)+
  ylab('Percent of Cylinder Group, %') +
  scale_y_continuous(labels = scales::percent)

image of produced plot with percentages by fill variable rather than grouping variable

Hope this helps!

L.C.
  • 11
  • 1