5

I'm trying to display percentage numbers as labels inside the bars of a stacked bar plot in ggplot2. I found some other post from 3 years ago but I'm not able to reproduce it: How to draw stacked bars in ggplot2 that show percentages based on group?

The answer to that post is almost exactly what I'm trying to do.

Here is a simple example of my data:

df = data.frame('sample' = c('cond1','cond1','cond1','cond2','cond2','cond2','cond3','cond3','cond3','cond4','cond4','cond4'),
                'class' = c('class1','class2','class3','class1','class2','class3','class1','class2','class3','class1','class2','class3'))
ggplot(data=df, aes(x=sample, fill=class)) + 
    coord_flip() +
    geom_bar(position=position_fill(reverse=TRUE), width=0.7)

enter image description here

I'd like for every bar to show the percentage/fraction, so in this case they would all be 33%. In reality it would be nice if the values would be calculated on the fly, but I can also hand the percentages manually if necessary. Can anybody help?

Side question: How can I reduce the space between the bars? I found many answers to that as well but they suggest using the width parameter in position_fill(), which doesn't seem to exist anymore.

Thanks so much!

EDIT:

So far, there are two examples that show exactly what I was asking for (big thanks for responding so quickly), however they fail when applying it to my real data. Here is the example data with just another element added to show what happens:

df = data.frame('sample' = c('cond1','cond1','cond1','cond2','cond2','cond2','cond3','cond3','cond3','cond4','cond4','cond4','cond1'),
                'class' = c('class1','class2','class3','class1','class2','class3','class1','class2','class3','class1','class2','class3','class2'))

Essentially, I'd like to have only one label per class/condition combination.

Community
  • 1
  • 1
fakechek
  • 239
  • 2
  • 14
  • 3
    Possible duplicate of [ggplot replace count with percentage in geom\_bar](http://stackoverflow.com/questions/24776200/ggplot-replace-count-with-percentage-in-geom-bar) and [there](http://stackoverflow.com/questions/3695497/show-instead-of-counts-in-charts-of-categorical-variables) – Roman Apr 20 '17 at 13:12

3 Answers3

5

I think what OP wanted was labels on the actual sections of the bars. We can do this using data.table to get the count percentages and the formatted percentages and then plot using ggplot:

library(data.table)
library(scales)
dt <- setDT(df)[,list(count = .N), by = .(sample,class)][,list(class = class, count = count,
                percent_fmt = paste0(formatC(count*100/sum(count), digits = 2), "%"),
                percent_num = count/sum(count)
                ), by = sample]

ggplot(data=dt, aes(x=sample, y= percent_num, fill=class)) +   
  geom_bar(position=position_fill(reverse=TRUE), stat = "identity", width=0.7) +
  geom_text(aes(label = percent_fmt),position = position_stack(vjust = 0.5)) + coord_flip()

enter image description here

Edit: Another solution where you calculate the y-value of your label in the aggregate. This is so we don't have to rely on position_stack(vjust = 0.5):

dt <- setDT(df)[,list(count = .N), by = .(sample,class)][,list(class = class, count = count,
               percent_fmt = paste0(formatC(count*100/sum(count), digits = 2), "%"),
               percent_num = count/sum(count),
               cum_pct = cumsum(count/sum(count)),
               label_y = (cumsum(count/sum(count)) + cumsum(ifelse(is.na(shift(count/sum(count))),0,shift(count/sum(count))))) / 2
), by = sample]

ggplot(data=dt, aes(x=sample, y= percent_num, fill=class)) +   
  geom_bar(position=position_fill(reverse=TRUE), stat = "identity", width=0.7) +
  geom_text(aes(label = percent_fmt, y = label_y)) + coord_flip()
Mike H.
  • 13,960
  • 2
  • 29
  • 39
  • That's indeed what I was looking to do, thanks! There is something strange about it though. Your example works perfectly, but when I apply this to my data (which is a bit more than in the example given), the output looks like this: [link](https://ibb.co/jDNTWQ) I think that it is labelling every instance of a class that goes into the bar. But then I don't understand why the bars are lost and why the axis is completely off. – fakechek Apr 20 '17 at 13:32
  • Hmm, can you post more of your data? – Mike H. Apr 20 '17 at 13:35
  • I'll add it to the post. – fakechek Apr 20 '17 at 13:40
  • @fakechek see my updates. I changed `sum(count)` instead of `.N` in the second `data.table` chain. I also collapsed in the first step. I think this was the source of the issue. – Mike H. Apr 20 '17 at 13:53
  • Thank you so much, now it is perfect :) – fakechek Apr 20 '17 at 14:36
3

Here is a solution where you first calculate the percentages using dplyr and then plot them:

UPDATED:

options(stringsAsFactors = F)

df = data.frame(sample = c('cond1','cond1','cond1','cond2','cond2','cond2','cond3','cond3','cond3','cond4','cond4','cond4'), 
                class = c('class1','class2','class3','class1','class2','class3','class1','class2','class3','class1','class2','class3'))

library(dplyr) 
library(scales)

df%>%
  # count how often each class occurs in each sample.
  count(sample, class)%>% 
  group_by(sample)%>%
  mutate(pct = n / sum(n))%>%
  ggplot(aes(x = sample, y = pct, fill = class)) + 
  coord_flip() +
  geom_col(width=0.7)+
  geom_text(aes(label = paste0(round(pct * 100), '%')),
            position = position_stack(vjust = 0.5))

enter image description here

Jeroen Boeye
  • 580
  • 4
  • 18
  • Thanks so much, unfortunately I have the same problem as with someone else solution. Briefly, when adding more elements (class/conditions combinations), that get grouped according to the colour code, they all get their own label. Instead, I'm trying to add only one label per color :/ – fakechek Apr 20 '17 at 13:47
  • I'll update the answer to be a bit more robust to more class/conditions combinations. – Jeroen Boeye Apr 20 '17 at 13:54
  • If you don't want to count the classes within each sample but rather have unique combinations of classes and samples you could add the line 'distinct(sample, class)' at the start of the analysis (above ' count(sample, class)'). In this alternative calculation the percentages within each sample will always be equal. – Jeroen Boeye Apr 20 '17 at 14:14
  • Fantastic, thanks a lot for the great help! This is beautiful! – fakechek Apr 20 '17 at 14:37
2

Use scales

library(scales)
ggplot(data=df, aes(x=sample, fill=class)) +
  coord_flip() +
  geom_bar(position=position_fill(reverse=TRUE), width=0.7) +
  scale_y_continuous(labels =percent_format())
Axeman
  • 32,068
  • 8
  • 81
  • 94
Erdem Akkas
  • 2,062
  • 10
  • 15
  • Thanks for the advice, that was something I also wanted to address later on. But actually, what I'm trying to do here is adding a label to every bar, e.g. using geom_text() or geom_label(), saying the percentage of each class in that condition. – fakechek Apr 20 '17 at 13:20