2

I have a data set like this ->

library(ggplot2)

response <- c("Yes","No")
gend <- c("Female","Male")

purchase <- sample(response, 20, replace = TRUE)
gender <- sample(gend, 20, replace = TRUE)

df <- as.data.frame(purchase)
df <- cbind(df,gender)

so head(df) looks like this ->

  purchase gender
1      Yes Female
2       No   Male
3       No Female
4       No Female
5      Yes Female
6       No Female

Also, so you can validate my examples, here is table(df) for my particular sampling.
(please don't worry about matching my percentages)

         gender
purchase Female Male
     No       6    3
     Yes      4    7

I want a "histogram" showing Gender, but split by Purchase. I have gone this far ->

ggplot(df) + 
       geom_bar(aes(y = (..count..)/sum(..count..)),position = "dodge") + 
       aes(gender, fill = purchase)

which generates ->

histogram with split bins, by percentage, but not the aggregate level I want histogram with split bins, by percentage, but not the aggregate level I want

The Y axis has Percentage as I want, but it has each bar of the chart as a percentage of the whole chart. What I want is the two "Female" bars to each be a percentage of there respective "Purchase". So in the chart above I would like four bars to be, 66%, 36%, 33%, 64% , in that order.

I have tried with geom_histogram to no avail. I have checked SO, searched, ggplot documentation, and several books.

Regarding the suggestion to look at the previous question about facets; that does work, but I had hoped to keep the chart visually as it is above, as opposed to split into "two charts". So...

Anyone know how to do this?

Thanks.

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
John Bennett
  • 33
  • 1
  • 6
  • Possible duplicate of [percentage on y lab in a faceted ggplot barchart?](https://stackoverflow.com/questions/4725339/percentage-on-y-lab-in-a-faceted-ggplot-barchart) – lbusett Aug 17 '17 at 20:17
  • @LoBu, thanks for the suggestion. The facets will work, and I can get by with that if I need to, but I was hoping to keep the chart "intact" as it looks in the question. But definitely thanks, because I'm going to use that if I can't get what I'm hoping for. – John Bennett Aug 18 '17 at 06:12

2 Answers2

3

Try something like this:

library(tidyverse)

df %>% 
count(purchase, gender) %>% 
ungroup %>% 
group_by(gender) %>% 
mutate(prop = prop.table(n)) %>% 
ggplot(aes(gender, prop, group = purchase)) + 
geom_bar(aes(fill = purchase), stat = "identity", position = "dodge")

enter image description here

The first 5 lines create a column prop (for "proportion"), which aggregates across gender.

To get there, you first count each purchase by gender (similar to the output of table(df). Ungrouping then regrouping only by gender gives the aggregation we want.

TTNK
  • 414
  • 2
  • 6
  • Thanks @TTNK, that works great, but it feels like it should be possible entirely within GGPlot2. I'm trying to hone my ggplot skills so I'm hoping to find a definitive "it can" or "it can't" be done. I'll definitely remember this as a fall back. Or for times I just want to use base graphing. – John Bennett Aug 18 '17 at 06:21
  • @TTNK I have tried this and I get this error Error: wt_var must be a single variable – sar Apr 12 '20 at 17:47
  • @TTNK when I use the original posters data with your code I get this error Error in FUN(X[[i]], ...) : object 'Yes' not found – sar Apr 12 '20 at 17:53
1

Regarding the percentages you want, is the denominator based on gender, or purchase? In the example given above, 66% for female & no purchase would be a result of 6 divided by the sum of no purchases (6+3) rather than the sum of all females (6+4).

It's definitely possible to plot that, but I'm not sure if the result would be intuitive to interpret. I got confused myself for a while.

The following hack makes use of the weight aesthetic. I've used purchase as the grouping variable here based on the expected output described in the question, though I think gender makes more sense (as per TTNK's answer above):

df <- data.frame(purchase = c(rep("No", 6), rep("Yes", 4), rep("No", 3), rep("Yes", 7)),
                 gender = c(rep("Female", 10), rep("Male", 10)))

ggplot(df %>% 
         group_by(purchase) %>% #change this to gender if that's the intended denominator
         mutate(w = 1/n()) %>% ungroup()) + 
  aes(gender, fill = purchase, weight = w)+ 
  geom_bar(aes(x = gender, fill = purchase), position = "dodge")+
  scale_y_continuous(name = "percent", labels = scales::percent)

bar plot with weights

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
  • I get this error when running your script Error: n() should only be called in a data context – sar Apr 12 '20 at 17:51