1

I would like to create a barplot like this:

library(ggplot2)

# Dodged bar charts
ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar(position="dodge")

However, instead of counts, I want to have the percentage of observations falling into each 'clarity' category by cutting category ('fair', 'good', 'very good' ...).

With this ...

# Dodged bar charts
ggplot(diamonds, aes(clarity, fill=cut)) + 
geom_bar(aes(y = (..count..)/sum(..count..)), position="dodge")

I get percentages on the y-axis, but these percentages ignore the cut-factor. I want that all the red bars sum up to 1, all the yellow bars sum up to 1 etc.

Is there an easy way to make that work without having to prepare the data manually?

Thanks!

P.S.: This is a follow-up to this stackoverflow question

Community
  • 1
  • 1
grueb
  • 123
  • 11

1 Answers1

1

You could use sjp.xtab from the sjPlot-package for that:

sjp.xtab(diamonds$clarity, 
         diamonds$cut, 
         showValueLabels = F, 
         tableIndex = "row", 
         barPosition = "stack")

enter image description here

The data preparation for stacked group-percentages that sum up to 100% should be:

data.frame(prop.table(table(diamonds$clarity, diamonds$cut),1))

thus, you could write

mydf <- data.frame(prop.table(table(diamonds$clarity, diamonds$cut),1))
ggplot(mydf, aes(Var1, Freq, fill = Var2)) + 
  geom_bar(position = "stack", stat = "identity") +
  scale_y_continuous(labels=scales::percent)

Edit: This one adds up each category (Fair, Good...) to 100%, using 2 in prop.table and position = "dodge":

mydf <- data.frame(prop.table(table(diamonds$clarity, diamonds$cut),2))
ggplot(mydf, aes(Var1, Freq, fill = Var2)) + 
    geom_bar(position = "dodge", stat = "identity") +
    scale_y_continuous(labels=scales::percent)

or

sjp.xtab(diamonds$clarity, 
         diamonds$cut, 
         showValueLabels = F, 
         tableIndex = "col")

enter image description here

Verifying the last example with dplyr, summing up percentages within each group:

library(dplyr)
mydf %>% group_by(Var2) %>% summarise(percsum = sum(Freq))

>        Var2 percsum
> 1      Fair       1
> 2      Good       1
> 3 Very Good       1
> 4   Premium       1
> 5     Ideal       1

(see this page for further plot-options and examples from sjp.xtab...)

Daniel
  • 7,252
  • 6
  • 26
  • 38
  • I really liked the way you are creating the proportion table and using it for creating the plot with %. I was looking for a similar solution to implement in my Shiny app for dynamic variable selection basis user input, but I am unable to update the plot post the 1st selection. i.e. the plot does not get updated with change in variable selection. I tried creating **mydf** in reactive function and then in renderplotly function, but in both cases the plot does not get updated. Would you know where I might be going wrong? – user1412 Dec 12 '16 at 11:55