4

Since I was confused about the math last time I tried asking this, here's another try. I want to combine a histogram with a smoothed distribution fit. And I want the y axis to be in percent.

I can't find a good way to get this result. Last time, I managed to find a way to scale the geom_bar to the same scale as geom_density, but that's the opposite of what I wanted.

My current code produces this output:

ggplot2::ggplot(iris, aes(Sepal.Length)) +
  geom_bar(stat="bin", aes(y=..density..)) +
  geom_density()

enter image description here

The density and bar y values match up, but the scaling is nonsensical. I want percentage on the y axes, not well, the density.

Some new attempts. We begin with a bar plot modified to show percentages instead of counts:

gg = ggplot2::ggplot(iris, aes(Sepal.Length)) +
  geom_bar(aes(y = ..count../sum(..count..))) +
  scale_y_continuous(name = "%", labels=scales::percent)

enter image description here

Then we try to add a geom_density to that and somehow get it to scale properly:

gg + geom_density()

enter image description here

gg + geom_density(aes(y=..count..))

enter image description here

gg + geom_density(aes(y=..scaled..))

enter image description here

gg + geom_density(aes(y=..density..))

Same as the first.

gg + geom_density(aes(y = ..count../sum(..count..)))

enter image description here

gg + geom_density(aes(y = ..count../n))

enter image description here

Seems to be off by about factor 10...

gg + geom_density(aes(y = ..count../n/10))

same as:

gg + geom_density(aes(y = ..density../10))

enter image description here

But ad hoc inserting numbers seems like a bad idea.

One useful trick is to inspect the calculated values of the plot. These are not normally saved in the object if one saves it. However, one can use:

gg_data = ggplot_build(gg + geom_density())
gg_data$data[[2]] %>% View

Since we know the density fit around x=6 should be about .04 (4%), we can look around for ggplot2-calculated values that get us there, and the only thing I see is density/10.

How do I get geom_density fit to scale to the same y axis as the modified geom_bar?

Bonus question: why are the grouping of the bars different? The current function does not have spaces in between bars.

Community
  • 1
  • 1
CoderGuy123
  • 6,219
  • 5
  • 59
  • 89
  • The comment that you got last time which is that you are trying to use two axes on the same graph still applies. I believe that ggplot is just now starting to support this. Also in the newer ggplot they are much more clear about the distinction between histograms and bar plots and you probably really want a histogram. I think the basic issue is that you are thinking that a density is a percent, but it is not. I'm really not sure why you say the first one is nonsensical. – Elin Jan 28 '17 at 11:31
  • I want an y axis that makes sense to most readers, not density, which makes no sense to most readers. – CoderGuy123 Jan 28 '17 at 17:30
  • Also see [Histogram with normal curve](https://stackoverflow.com/a/36344354/4241780), the area of the bars and under the density curve need to be scaled to match each other. – JWilliman Nov 28 '20 at 19:55

2 Answers2

5

Here is an easy solution:

library(scales) # ! important
library(ggplot2)
ggplot(iris, aes(Sepal.Length)) +
    stat_bin(aes(y=..density..), breaks = seq(min(iris$Sepal.Length), max(iris$Sepal.Length), by = .1), color="white") +
    geom_line(stat="density", size = 1) +
    scale_y_continuous(labels = percent, name = "percent") +
    theme_classic()

Output:

enter image description here

CoderGuy123
  • 6,219
  • 5
  • 59
  • 89
AnnaZ
  • 146
  • 1
  • 8
  • Thank you for pointing it out. I forgot to specify here **library(scales)**. I've corrected the code, now it should work! – AnnaZ Jun 15 '17 at 22:16
  • 2
    I don't quite sure why the y-axis makes any sense. The bars will easily add up to more than 100%. How do we interpret the y percent axis? – Afiq Johari Jul 22 '20 at 07:49
1

Try this

ggplot2::ggplot(iris, aes(x=Sepal.Length)) +
geom_histogram(stat="bin", binwidth = .1, aes(y=..density..)) +
geom_density()+
scale_y_continuous(breaks = c(0, .1, .2,.3,.4,.5,.6),
       labels =c ("0", "1%", "2%", "3%", "4%", "5%", "6%") )  +
ylab("Percent of Irises") + 
xlab("Sepal Length in Bins of .1 cm")

I think your first example is what you want, you just want to change the labels to make it seem like it is percents, so just do that rather than mess around.

Elin
  • 6,507
  • 3
  • 25
  • 47