1

In my data I counted the number of arguments in written texts in two different groups and I want to compare them with a barplot. The groups do not have the same size, so comparison based upon absolute counts doesn't make sense, I need the frequencies of the number of arguments in each group.

Here is some exemplary data:

df <- data.frame(c("A","A","A","B","B","B","B","B","B"),c(1,1,2,0,1,1,1,2,2))
colnames(df) = c("group","count")

When I use

ggplot(df,aes(fill=group,x=count)) + geom_bar(position="dodge")

I got this barplot with the absolute counts, which is not what I want: plot with absolute counts

Instead, I want a plot that looks like this: plot with relative frequencies I created this plot with

df2 <- data.frame(c("A","A","A","B","B","B"),c(0,1,2,0,1,2),c(0,0.67,0.33,0.167,0.5,0.33))
colnames(df2) = c("group","count","relFreq")
ggplot(df2,aes(fill=group,x=count,y=relFreq)) + geom_bar(position="dodge",stat="identity")

In this minimal example I can calculate the relative frequencies pretty easily. I could also do this with my data, but this would be to laborious in my opinion. Is there any way I can do this with ggplot? I tried this solution Display frequency instead of count with geom_bar() in ggplot but this gives me the frequencies of all arguments and the height of the bars doesn't change. I also tried this Plot relative frequencies with dodged bar plots in ggplot2, which is much closer to what I want, but this is a continuous x-axis, which I don't want.

Quinten
  • 35,235
  • 5
  • 20
  • 53
paboe
  • 13
  • 4

2 Answers2

3

Why not just summarise ahead of plotting?

library(tidyverse)

df %>%
  count(group, count) %>%
  mutate(n = n / sum(n), .by = 'group') %>%
  ggplot(aes(count, n, fill = group)) +
  geom_col(position = position_dodge(preserve = 'single'))

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
2

As mentioned by @Allan Cameron in the comments you should use after_stat(prop), since ..prop.. is deprecated like this:

library(ggplot2)
ggplot(df,aes(fill=group,x=count)) + 
  geom_bar(aes(y = after_stat(prop), group = group), position = position_dodge(preserve = "single"))


You could use ..prop.. to the y aesthetic per group without pre calculating the values like this:

library(ggplot2)
ggplot(df, aes(fill = group, x = count)) + 
  geom_bar(aes(y = ..prop.., group = group), position = "dodge")

Created on 2023-08-18 with reprex v2.0.2

If you also want to show the 0 values of group you could use position = position_dodge(preserve = "single") instead.

Quinten
  • 35,235
  • 5
  • 20
  • 53
  • Hi @AllanCameron, Thanks for the suggestion! Yes, you are right! Summarizing the data you did is of course the way to go. – Quinten Aug 18 '23 at 07:53
  • 1
    no, I think using `prop` in this case is very reasonable - your code is more concise, and this is a perfect use case for `prop`. I just find that a lot of folks tie themselves in knots trying to get ggplot to do data manipulations that would be trivial to do beforehand, and that in most cases a policy of 'manipulate your data into something that's easy to plot, then plot it' is often better than 'try to get the plotting software to do it all for you' . – Allan Cameron Aug 18 '23 at 08:00
  • 1
    Thanks for the quick help of you both! Next time I will try to manipulate my data beforehand :) But this time the "prop" solution works just fine for me and does exactly what I want. – paboe Aug 18 '23 at 08:16