4

I would like to know the default origin of the first bin in a histogram created with ggplot2 for a given bin width. Unfortunately, I did not find any information at the help pages of geom_histogram, geom_bar, and stat_bin. Please find below a minimal example for a histogram with ggplot2.

 library(ggplot2)
 x <- rnorm(25)
 binwidth <- (range(x)[2]-range(x)[1])/10
 ggplot(data.frame(x=x), aes(x = x)) +
   geom_histogram(aes(y = ..density..), binwidth = binwidth)
Nussig
  • 291
  • 1
  • 8

1 Answers1

6

By default, the histogram is centered at 0, and the first bars xlimits are at 0.5*binwidth and -0.5*binwidth. From there, the bars continue with width = binwidth in both directions until they hit the minimum and maximum. Or, if you data is all > 0, they start at the first (x+0.5)*binwidth that contains data.

For your example (using a set.seed for reproducibility):

set.seed(1)
x <- rnorm(25)
binwidth <- (range(x)[2]-range(x)[1])/10
p <- ggplot(data.frame(x=x), aes(x = x)) +
   geom_histogram(aes(y = ..density..), binwidth = binwidth)

We can get the breaks out by using:

x1 <- ggplot_build(p)$data

giving us our breaks:

x1[[1]]$x
 [1] -2.4764874 -2.0954894 -1.7144913 -1.3334932 -0.9524952 -0.5714971 -0.1904990  0.1904990  0.5714971
[10]  0.9524952  1.3334932  1.7144913  2.0954894

So, to get the minimum, we need to round the lowest value of the data to a multiple of binwidth + 0.5 (NB I'm sure there is a better formula, but this works):

binwidth*(floor((min(x)-binwidth/2)/binwidth)+0.5)
-2.476487

similarly the maximum is:

binwidth*(ceiling((max(x)+binwidth/2)/binwidth)+0.5)
2.095489
jeremycg
  • 24,657
  • 5
  • 63
  • 74
  • Related answer, which shows how to get ``xmin`` along the lines of jeremycg's answer above: https://stackoverflow.com/questions/7740503/getting-frequency-values-from-histogram-in-r/47137411#47137411 – PatrickT Dec 06 '17 at 07:48