0

I have a histogram that uses color to distinguish categories of data. When I make the plot, there is a line across the whole thing using the last color in the sequence. It looks bad since there are a lot of gaps in the data.

Here's a MRE using cars:

library(ggplot2)
data(mtcars)
ggplot(mtcars, aes(x=mpg)) + geom_histogram(bins=15, colour='red')

Note that there are two columns with 0 values in them, but they still have the red outline. I would like to remove that if possible.

I was able to hide it with a horizontal line, but I was hoping there was a better way to do it without adding another geom.

library(ggplot2)
data(mtcars)

ggplot(mtcars, aes(x=mpg)) + 
geom_histogram(bins=15, colour='red') +   
geom_hline(yintercept = 0, color = "white")
  • 1
    `uses color to distinguish categories of data`... how? you assign the static color to all columns, so there's no "using the last color" in this. Close to this might be using `geom_histogram(bins=15, aes(colour=factor(cyl)))` instead. FYI, it might not be the "last color" as much as "all colors", but since all colors are opaque (`alpha=1`), you only see the top-most (last-drawn). – r2evans Feb 14 '23 at 19:37

3 Answers3

3

You can change the computation on the y-axis to return NA values if the frequency is 0 instead. It will return a Removed 8 rows containing missing values ('position_stack()'). warning, but the plot returned is correct:

ggplot(mtcars, aes(x = mpg,
                   y = ifelse(after_stat(count) > 0, after_stat(count), NA))) +
  geom_histogram(binwidth = 1, colour='red') 

enter image description here

nrennie
  • 1,877
  • 1
  • 4
  • 14
0

You could change the red colour to white when your y value is 0 using ggplot_build like this:

library(ggplot2)
data(mtcars)
p <- ggplot(mtcars, aes(x=mpg)) + 
  geom_histogram(bins=15, colour='red')
q <- ggplot_build(p)
q$data[[1]]$colour <- with(q$data[[1]], ifelse(y == 0, 'white', colour))
q <- ggplot_gtable(q)
plot(q)

Created on 2023-02-14 with reprex v2.0.2

Quinten
  • 35,235
  • 5
  • 20
  • 53
0

Another option is to use geom_col on already-summarized data, removing 0-counts.

h
# $breaks
#  [1] 10 12 14 16 18 20 22 24 26 28 30 32 34
# $counts
#  [1] 2 1 7 3 5 5 2 2 1 0 2 2
# $density
#  [1] 0.031250 0.015625 0.109375 0.046875 0.078125 0.078125 0.031250 0.031250 0.015625 0.000000 0.031250 0.031250
# $mids
#  [1] 11 13 15 17 19 21 23 25 27 29 31 33
# $xname
# [1] "mtcars$mpg"
# $equidist
# [1] TRUE
# attr(,"class")
# [1] "histogram"

(Note that breaks is length 13, and counts is length 12, that's intentional.)

You should work with the breaks=12 that I used to better reflect what you want in the plot. It can (and does) effect where one sees "0" in the tabulation.

data.frame(x = h$breaks[-1] - diff(h$breaks[1:2])/2, y = h$counts) |>
  subset(y > 0) |>
  ggplot(aes(x, y)) +
  geom_col(colour = 'red')

ggplot2 geom_col

r2evans
  • 141,215
  • 6
  • 77
  • 149