TL;DR: You cannot and should not use a log scale with a stacked barplot. If you want to use a log scale, use a "dodged" barplot instead. You'll also have better luck to use geom_col
instead of geom_bar
here and set your fill=
variable as a factor.
Geom_col vs. geom_bar
Try using geom_col
in place of geom_bar
. You can use coord_flip()
if the direction is not to your liking. See here for reference, but the gist of the issue is that geom_bar
should be used when you want to plot against "count", and geom_col
should be used when you want to plot against "values". Here, your y-axis is "cases" (a value), so use geom_col
.
The Problem with log scales and Stacked Barplots
With that being said, u/Dave2e is absolutely correct. The plot you are getting makes sense, because the underlying math being done to calculate the y-axis values is: log10(x) + log10(y) + log10(z)
instead of what you expected, which was log10(x + y + z)
.
Let's use the numbers in your actual data frame for comparison here. In "group 1", you have the following:
index group color cases
3 1 1 5000
9 1 2 153
12 1 3 23
So on the y-axis what's happening is the total value of a stacked barplot (without a log scale) will be the sum of all. In other words:
> 5000 + 153 + 23
[1] 5176
This means that each of the bars represents the correct relative size, and when you add them up (or stack them up), the total size of the bar is equivalent to the total sum. Makes sense.
Now consider the same case, but for a log10 scale:
> log10(5000) + log10(153) + log10(23)
[1] 7.245389
Or, just about 17.5 million. The total height of the bar is still the sum of all individual bars (because that's what a stacked barplot is), and you can still compare the relative sizes, but the sum total of the individual logs does not equal the log of the sum:
>log10(5000 + 153 + 23)
[1] 3.713994
Suggested Way to Change your Plot
Moral of the story: you can still use a log scale to "stretch out" the small bars, but don't stack them. Use postion='dodge'
:
df %>% ggplot( aes(x= group,y= log10(cases),fill=as.factor(color) ) ) +
geom_col(position='dodge') +
theme_minimal()

Finally, position='dodge'
(or position=position_dodge(width=...)
) does not work with fill=color
, since df$color
is not a factor (it's numeric). This is also why your legend is showing a gradient for a categorical variable. That's why I used as.factor(color)
in the ggplot
call here, although you can also just apply that to the original dataset with df$color <- as.factor(df$color)
and do the same thing.