0

Using ggplot2 in R i want to plot histogram histograms starting strictly at the minimal value of the dataset and it must end strictly at the maximal value of dataset.

When adding vertical lines on minimums and maximums, bins of histogram are overlapping that values. I have tried to shrink bins, or to change their quantity, and also to reduce space between them. But nothing helped.

bins = 5
bwidth =  (max(data$deltaQ)-min(data$deltaQ))/bins
ggplot(data=data ) +
  geom_histogram(
    mapping=aes(x=data$deltaQ)
    , binwidth = bwidth 
    , na.rm = TRUE
    , fill = "yellow"
    , color = "black" 
    , position="stack"   #identity, dodge, stacked
    , boundary=0
  )+
  geom_vline(xintercept = min(data$deltaQ) , color = "green" , na.rm = TRUE, mapping=aes(size=5)  )+
  geom_vline(xintercept = max(data$deltaQ) , color = "green" , na.rm = TRUE, mapping=aes(size=5))+
  geom_vline(mapping=aes(size=5)  , xintercept = min(data$deltaQMin) , color = "red" , na.rm = TRUE, linetype = "longdash")+
  geom_vline(mapping=aes(size=5)  , xintercept = max(data$deltaQMin) , color = "red" , na.rm = TRUE, linetype = "longdash")+
  geom_vline(mapping=aes(size=5)  , xintercept = max(data$deltaQMax) , color = "red" , na.rm = TRUE, linetype = "longdash")+
  geom_vline(mapping=aes(size=5)  , xintercept = min(data$deltaQMax) , color = "red" , na.rm = TRUE, linetype = "longdash")+
  xlim(-50,50)

Current hist() or geom_histogram have bin center in minimum and maximum which causes overlapping. I need to exclude possibility of bin crossing the minimal or maximal value.

ezop
  • 37
  • 4
  • We don't have your data, so we can't run this code, and we can't see any output, so we don't know what exactly is the problem you're saying doesn't work. [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R post that easy to help with. – camille May 19 '19 at 21:11
  • Also, you can use `xintercept` inside the `aes` of `geom_vline`. Instead of using the same geom 6 times, you'd probably be better off reshaping your data to fit the ggplot paradigm and only calling that geom once – camille May 19 '19 at 21:12

1 Answers1

1

Try to set your boundary argument to the min() or max() of the data in your call to geom_histogram.

Using the diamonds dataset from ggplot2, you can see that setting the boundary to min(diamonds$carat) gives you boundaries at the minimum and maximum values of the data. max(diamonds$carat) does the same.

library(tidyverse)

data(diamonds)
diamonds <- filter(diamonds, carat <= 1)

ggplot(diamonds, aes(x = carat)) +
  geom_histogram(boundary = min(diamonds$carat)) +
  geom_vline(aes(xintercept = min(carat)), color = 'red') +
  geom_vline(aes(xintercept = max(carat)), color = 'red')

enter image description here

  • Thank you Christopher. That helps. Btw, i have noticed that boundary removes last bin completely. Is there any way to move all bins closer to each other or shrink them so i don't loose last bin. – ezop May 19 '19 at 20:43
  • You aren't losing that last bin of data: if you remove the `boundary` argument, the bins **do** go beyond the boundary and you have one extra showing up, yes, but the data isn't removed when you add the `boundary` argument. The bins are just reshaped to fit all the data into the boundaries, as you can see in my image. There will be n - 1 bins, where n = the number of bins specified (or 30 as per default), but the data should still be shown. The `geom_vline` calls should provide a sanity check to show you that the data is still included. – Christopher Dudley May 19 '19 at 21:06
  • You can check this with a simple histogram of a uniformly distributed dataset of 20 or so numbers; when you plot the histogram without the boundaries, the bins will extend beyond the vlines, but when you add the boundaries, they fit within the vlines. You can count how many data points are in both histograms by hand and they should be equal, but the one with the boundaries will have one less bin. – Christopher Dudley May 19 '19 at 21:19
  • In case you wanted the vertical red lines to touch the limits of the plot, you can use: "scale_x_continuous(expand = c(0,0))" – zeehio May 20 '19 at 06:29