0

A minimal sub-set of my data is

mydata <- read.table(header = TRUE, text= "
                     Product    Characteristic  Product_category
AA  Functional  A
                     AB Functional  A
                     AB Portable    A
                     BA Portable    B
                     BA Quality B
                     BB Quality B
                     BA Bright  B
                     BB Sound   B
                     BB Sound   B
                     BC Sound   B
                     BC Sound   B
                     BC Work    B
                     CA Functional  C
                     CA Functional  C
                     CA Functional  C
                     CA Functional  C
                     CB Functional  C
                     CC Functional  C
                     CC Functional  C
                     CC Functional  C
                     CC Functional  C
                     CC Portable    C
                     CC Design  C
                     CD Quality C
                     CD Quality C
                     CD Output  C
                     CD Noise   C
                     CD Noise   C
                     CD Component   C
                     CD Component   C

                     ")

I want to make 3 barplots corresponding to each of the 3 product categories with x=Characteristic and y axis having count of each Characteristic. Further I want to stack the bar with Product. So the code for the barplot for lets say Product_category A is -

mydata %>% filter(Product_category == "A") %>% 
  ggplot(aes(x=Characteristic, fill = Product)) + geom_bar(width = 0.2) + coord_flip()

This part is easy. I am struggling with two things - I want to order the stacked bars in descending order of the count of each Characteristic. This dataset being the minimal subset of my dataset, so here by default the bars may appear ordered but in my actual dataset they are not. The second thing I want to do is to label each bar with percentage, such that the percentage is within each product category - formula = count(Characteristic)/sum(count(Characteristic)). So I want my final graph to look something as follows -

mydata %>% filter(Product_category == "A") %>% 
  group_by(Characteristic) %>%
  summarize(counts = n()) %>% arrange(counts) %>%
  mutate(Characteristic = factor(Characteristic, Characteristic), perc = counts/sum(counts)) %>%
  ggplot(aes(x=Characteristic, y = counts)) + 
  geom_bar(stat = "identity", width = 0.4) + 
  theme(axis.text.x=element_blank()) + 
  geom_text(aes(label = paste(round(perc*100, digits = 1),"%",sep = "")), hjust = -0.2, size = 2.8, position = position_dodge(width = 0.7), inherit.aes = TRUE) + 
  coord_flip()

With the only difference that I want each bar to be stacked by Product, so I can visually see the share of each Product within each Characteristic. I experimented with many many things but each is verbose and still do not lend desired result. What is the most tidy way of doing this?

jay.sf
  • 60,139
  • 8
  • 53
  • 110
user3816784
  • 61
  • 1
  • 2
  • 10
  • I want to add that there are other posts here about adding label such as this - https://stackoverflow.com/questions/30656846/draw-the-sum-value-above-the-stacked-bar-in-ggplot2 But my problem is to achieve sorting of bars along with percentage labels, and stacking bars with Product at the same time. – user3816784 Jul 14 '18 at 13:50

1 Answers1

0

You should be able to do this with a little bit of dplyr and ordering factors with forcats. Since, as you noted, there aren't a lot of observations in the category you filtered for, I took out the filtering just to illustrate with more data, and skipped a little of the specifics in your plot just for the sake of simplifying the example. The keys to this are setting up Characteristic and Product as factors, and then using Product to set fill, so you have stacked areas for each product within each characteristic.

There are a few other things you can simplify: geom_col is equivalent to geom_bar(stat = "identity"), and scales::percent will do the percentage formatting you have. In order to have the text in each bar piece, use position_stack with vjust = 0.5 to center the labels.

library(tidyverse)

mydata %>%
  # filter(Product_category == "A") %>%
  group_by(Characteristic, Product) %>%
  summarise(counts = n()) %>%
  mutate(perc = round(counts / sum(counts), digits = 3)) %>%
  ungroup() %>%
  mutate(Characteristic = as.factor(Characteristic) %>% fct_reorder(counts, .fun = sum)) %>%
  arrange(Characteristic, perc) %>%
  mutate(Product = as.factor(Product) %>% fct_reorder(perc, .desc = F)) %>%
  ggplot(aes(x = Characteristic, y = counts, fill = Product)) +
    geom_col(position = "stack") +
    geom_text(aes(label = scales::percent(perc)), 
              position = position_stack(vjust = 0.5), size = 3) +
    coord_flip()

Created on 2018-07-14 by the reprex package (v0.2.0).

camille
  • 16,432
  • 18
  • 38
  • 60
  • Thanks Camille. I do not want the percentages to be within each Characteristic but overall within each Product category. And so the group_by before ggplot should only have Characteristic inside and not Product. I also want only one percentage that of overall bar displayed at the end of each bar, and not for all sub-sections of the bar corresponding to each product, so its easy to see the bar total. But if I do that, then use fill=Product within ggplot, I get the error - 'Error in FUN(X[[i]], ...) : object 'Product' not found' – user3816784 Jul 15 '18 at 05:20
  • So the formula for % should be in line with the following code: `mydata %>% group_by(Characteristic) %>% summarize(counts = n()) %>% arrange(counts) %>% ungroup() %>% mutate(Characteristic = factor(Characteristic, Characteristic), perc = counts/sum(counts)) %>% ggplot(aes(x=Characteristic, y = counts, fill=Product)) + geom_col(position = "stack") + theme(axis.text.x=element_blank()) + geom_text(aes(label = scales::percent(perc)), hjust = -0.2, size = 2.8, position = position_dodge(width = 0.7), inherit.aes = TRUE) + coord_flip()` Thanks for simplifying the code! – user3816784 Jul 15 '18 at 05:29
  • The trouble is that in order to put a label of count of a categorical variable on a bar, you need to group_by and `summarize(count = n())` it before writing ggplot. But within the ggplot if you try to stack the bar or facet the graph with a second categorical variable which has not been included in group_by it throws an error. How to overcome this? – user3816784 Jul 15 '18 at 06:14
  • I'm unclear now on how you would want to stack bars by product. I suppose you could use `group = Product` inside your `aes` but not fill...? Run the bit of that code in your comment through `summarize`, and you'll see that by only grouping on `Characteristic` you lose `Product` if you use `summarize`. Maybe you want to `mutate` instead of get a true summary? – camille Jul 15 '18 at 15:27