How to remove outline between columns in geom_histogram

Question

I have a histogram that uses color to distinguish categories of data. When I make the plot, there is a line across the whole thing using the last color in the sequence. It looks bad since there are a lot of gaps in the data.

Here's a MRE using cars:

library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(x=mpg)) + geom_histogram(bins=15, colour='red')

Note that there are two columns with 0 values in them, but they still have the red outline. I would like to remove that if possible.

I was able to hide it with a horizontal line, but I was hoping there was a better way to do it without adding another geom.

library(ggplot2)
data(mtcars)

ggplot(mtcars, aes(x=mpg)) + 
geom_histogram(bins=15, colour='red') +   
geom_hline(yintercept = 0, color = "white")

`uses color to distinguish categories of data`... how? you assign the static color to all columns, so there's no "using the last color" in this. Close to this might be using `geom_histogram(bins=15, aes(colour=factor(cyl)))` instead. FYI, it might not be the "last color" as much as "all colors", but since all colors are opaque (`alpha=1`), you only see the top-most (last-drawn). — r2evans, Feb 14 '23 at 19:37

nrennie · Accepted Answer · 2023-02-14T19:46:14.207

3

You can change the computation on the y-axis to return NA values if the frequency is 0 instead. It will return a Removed 8 rows containing missing values ('position_stack()'). warning, but the plot returned is correct:

ggplot(mtcars, aes(x = mpg,
                   y = ifelse(after_stat(count) > 0, after_stat(count), NA))) +
  geom_histogram(binwidth = 1, colour='red')

edited Feb 14 '23 at 19:46

answered Feb 14 '23 at 19:45

nrennie

1,877
1
4
14

_This_ is why I need to learn more about `after_stat` and friends. – r2evans Feb 14 '23 at 19:46
Thank you. The hline trick cut off the bottom of each column which looked terrible. This is much better! – Seth Goodnight Feb 14 '23 at 20:48

score 0 · Answer 2 · answered Feb 14 '23 at 19:40

You could change the red colour to white when your y value is 0 using ggplot_build like this:

library(ggplot2)
data(mtcars)
p <- ggplot(mtcars, aes(x=mpg)) + 
  geom_histogram(bins=15, colour='red')
q <- ggplot_build(p)
q$data[[1]]$colour <- with(q$data[[1]], ifelse(y == 0, 'white', colour))
q <- ggplot_gtable(q)
plot(q)

^{Created on 2023-02-14 with reprex v2.0.2}

score 0 · Answer 3 · answered Feb 14 '23 at 19:44

Another option is to use geom_col on already-summarized data, removing 0-counts.

h
# $breaks
#  [1] 10 12 14 16 18 20 22 24 26 28 30 32 34
# $counts
#  [1] 2 1 7 3 5 5 2 2 1 0 2 2
# $density
#  [1] 0.031250 0.015625 0.109375 0.046875 0.078125 0.078125 0.031250 0.031250 0.015625 0.000000 0.031250 0.031250
# $mids
#  [1] 11 13 15 17 19 21 23 25 27 29 31 33
# $xname
# [1] "mtcars$mpg"
# $equidist
# [1] TRUE
# attr(,"class")
# [1] "histogram"

(Note that breaks is length 13, and counts is length 12, that's intentional.)

You should work with the breaks=12 that I used to better reflect what you want in the plot. It can (and does) effect where one sees "0" in the tabulation.

data.frame(x = h$breaks[-1] - diff(h$breaks[1:2])/2, y = h$counts) |>
  subset(y > 0) |>
  ggplot(aes(x, y)) +
  geom_col(colour = 'red')

How to remove outline between columns in geom_histogram

3 Answers3