What is ggplot - geom_histogram - boundary doing

Question

Using the moderndive package and the bowl data set comparing different presentation methods. Setting the boundary appears to change the location of the data distribution what am I missing for the function use?

vs <- bowl %>%
      rep_sample_n(size = 50, reps = 10000) %>%
      group_by(replicate) %>%
      summarise(num_red = sum(color=="red")) %>%
      mutate(prop_red = num_red/ 50) %>%
      mutate(prop_white  = 1 - prop_red)

c1 <-ggplot(vs, aes(x=prop_red)) + 
      geom_histogram(color =  "white", binwidth = 0.025) + labs(title="c1")

c2 <-ggplot(vs, aes(x=prop_red)) + 
      geom_histogram(color =  "white", binwidth = 0.025, boundary = 0.4)

grid.arrange(c1,c2,nrow = 1)

This SO question may be helpful: https://stackoverflow.com/questions/41486027/ggplot2-how-to-align-the-bars-of-a-histogram-with-the-x-axis — Peter, May 24 '20 at 17:00

score 0 · Answer 1 · answered Nov 09 '20 at 01:40

The boundary argument is a bin position specifier. You are forcing a bin break at 0.4 on the c2 plot, whereas the c1 plot allocates the bin breaks automatically based on a 0.025 binwidth.

Since your distribution is discountinuous (although this cannot be appreciated when using your 0.025 bindwidth), the position of your bins makes a big difference in the histogram count. See the third plot below, adding the following to your code:

c3 <-ggplot(vs, aes(x=prop_red)) + 
     geom_histogram(color =  "white", binwidth = 0.0125)

added third plot with smaller binwidth to show discontinuity

What is ggplot - geom_histogram - boundary doing

1 Answers1