2

I was wondering how does stat_summary_bin function bin when we use bin argument?

ggplot(df, aes(x=x,y=y)) +
               stat_summary_bin(fun='mean', bins=100,
               color='orange', size=2, geom='point') + geom_smooth(method='lm') + theme_minimal() 

Does it split the x-axis into equal width? Or Does each bin have an equal number of observations?

I thought stat_summary_bin splits the x axis into equally spaced bins. But the following plot it the result of the above code, and it doesn't seem like it's spaced equally on the x axis. stat_summary_bin

Aleiem
  • 226
  • 1
  • 9
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Seems like bins are evenly spaces but some of your bins might just be empty so no point is drawn. – MrFlick Jan 14 '21 at 00:11

1 Answers1

1

In your example code stat_summary_bin will make 20 bins equally spaced along the x axis. Then it will plot 1 y-axis point for each bin, based on the mean() for all the observations that occur within that bin.

You can observe this behaviour in the following plot

library(tidyverse)
iris %>% 
  ggplot(aes(x=Sepal.Length,y=Sepal.Width)) +
  stat_summary_bin(fun='mean', bins=20,
                   color='orange', size=2, geom='point')+
  geom_point()

Run with and without geom_point() turned on.

https://ggplot2.tidyverse.org/reference/stat_summary.html

M.Viking
  • 5,067
  • 4
  • 17
  • 33
  • This is the relevant code: 1) https://github.com/tidyverse/ggplot2/blob/master/R/stat-summary-bin.R 2) https://rdrr.io/github/tidyverse/ggplot2/src/R/stat-bin2d.r – M.Viking Jan 14 '21 at 15:40