0

I'm trying to color a ggplot histogram differently based on precise boundaries along the x axis. However, the colors are not accurate because a bin that contains values from both colors will show up as a mixed color bin split up horizontally. Example minimal code and problem chart below.

I would like to split the bin by color vertically. So that all values to the left of the cutoff line are one color and all values to the right of the cutoff line are the other color.

How can I accomplish this?

I think geom_density would not have this problem, but I would prefer to use geom_histogram instead of geom_density because the histogram shows actual counts on the y axis.

cutoff_point <- 3.9
mtcars %>% 
  mutate(wt_color = ifelse(wt < cutoff_point, "red", "blue")) %>% 
  select(wt, wt_color) %>% 
  ggplot(aes(x=wt, fill = wt_color)) +
  geom_histogram(bins = 5) +
  geom_vline(xintercept=cutoff_point, colour="black")

enter image description here

The boundary argument works well when I have just one cutoff point, but it doesn't work when I have two cutoff points like below

cutoff_point1 <- 2.5
cutoff_point2 <- 5.4

mtcars %>% 
  mutate(wt_color = case_when(
    wt < cutoff_point1 ~ "blue",
    wt > cutoff_point1 & wt < cutoff_point2 ~ "red",
    TRUE ~ "green"
    )) %>% 
  select(wt, wt_color) %>% 
  ggplot(aes(x=wt, fill = wt_color)) +
  geom_histogram(bins = 5, boundary=cutoff_point) +
  geom_vline(xintercept=cutoff_point, colour="black")

enter image description here

Harry M
  • 1,848
  • 3
  • 21
  • 37
  • you could supply your `geom_histogram` with a `boundary=cutoff_point`. This will alter the binning, so that one boundary is exactly at your cutoff – Julian_Hn Apr 17 '19 at 07:17
  • ah this works great when I have only two groups like in this example, but in my actual chart, I have multiple cutoffs and in that case it doesn't seem to do anything – Harry M Apr 17 '19 at 07:22
  • ah. yes. you can only specify one cutoff point. I'll have to think about something more sophisticated then – Julian_Hn Apr 17 '19 at 07:23
  • I'll update the question with another example – Harry M Apr 17 '19 at 07:28

1 Answers1

4

Maybe this'll work for you. You can specify the bin-breaks in geom_histogram. So we first create an evenly spaced bin-vector and add some cutoff points to it:

n.bins <- 5 # number of bins
additional.cutoffs <- c(3.9, 2.9) # additional bins

bins <- seq(min(mtcars$wt), max(mtcars$wt), length.out = n.bins)    
bins <- c(bins, additional.cutoffs) %>% sort()

mtcars %>% 
  mutate(wt_color = ifelse(wt < cutoff_point, "red", "blue")) %>% 
  select(wt, wt_color) %>% 
  ggplot(aes(x=wt, fill = wt_color)) +
  geom_histogram(breaks = bins) +
  geom_vline(xintercept=additional.cutoffs, colour="black")

enter image description here

f.lechleitner
  • 3,554
  • 1
  • 17
  • 35
  • Ah. I thought there was a way to add multiple breaks but could not remember for the life of me. Good solution! – Julian_Hn Apr 17 '19 at 07:34
  • Amazing! Thank you! – Harry M Apr 17 '19 at 07:43
  • This worked great, but on my dataset I realized the bars get chopped up and become short at the cut off points. I asked another [question here](https://stackoverflow.com/questions/55724880/coloring-ggplot-histogram-by-precise-cut-off-points-by-splitting-the-bins-vertic) showing an example. Any idea how it might be fixed? – Harry M Apr 17 '19 at 09:58