3

I am trying to plot a histogram of two overlapping distributions in ggplot2. Unfortunately, the graphic needs to be in black and white. I tried representing the two categories with different shades of grey, with transparency, but the result is not as clear as I would like. I tried adding outlines to the bars with different linetypes, but this produced some strange results.

require(ggplot2)
set.seed(65)
a = rnorm(100, mean = 1, sd = 1)
b = rnorm(100, mean = 3, sd = 1)
dat <- data.frame(category = rep(c('A', 'B'), each = 100),
              values = c(a, b))

ggplot(data = dat, aes(x = values, linetype = category, fill = category)) +
        geom_histogram(colour = 'black', position = 'identity', alpha = 0.4, binwidth = 1) +
        scale_fill_grey()

histogram

Notice that one of the lines that should appear dotted is in fact solid (at a value of x = 4). I think this must be a result of it actually being two lines - one from the 3-4 bar and one from the 4-5 bar. The dots are out of phase so they produce a solid line. The effect is rather ugly and inconsistent.

  1. Is there any way of fixing this overlap?
  2. Can anyone suggest a more effective way of clarifying the difference between the two categories, without resorting to colour?

Many thanks.

user2390246
  • 257
  • 1
  • 8

2 Answers2

6

One possibility would be to use a 'hollow histogram', as described here:

# assign your original plot object to a variable 
p1 <- ggplot(data = dat, aes(x = values, linetype = category, fill = category)) +
  geom_histogram(colour = 'black', position = 'identity', alpha = 0.4, binwidth = 0.4) +
  scale_fill_grey()
# p1

# extract relevant variables from the plot object to a new data frame
# your grouping variable 'category' is named 'group' in the plot object
df <- ggplot_build(p1)$data[[1]][ , c("xmin", "y", "group")]

# plot using geom_step
ggplot(data = df, aes(x = xmin, y = y, linetype = factor(group))) +
  geom_step()

enter image description here

If you want to vary both linetype and fill, you need to plot a histogram first (which can be filled). Set the outline colour of the histogram to transparent. Then add the geom_step. Use theme_bw to avoid 'grey elements on grey background'

p1 <- ggplot() +
  geom_histogram(data = dat, aes(x = values, fill = category),
                 colour = "transparent", position = 'identity', alpha = 0.4, binwidth = 0.4) +
  scale_fill_grey()

df <- ggplot_build(p1)$data[[1]][ , c("xmin", "y", "group")]
df$category <- factor(df$group, labels = c("A", "B"))

p1 +
  geom_step(data = df, aes(x = xmin, y = y, linetype = category)) +
  theme_bw()

enter image description here

Community
  • 1
  • 1
Henrik
  • 65,555
  • 14
  • 143
  • 159
1

First, I would recommend theme_set(theme_bw()) or theme_set(theme_classic()) (this sets the background to white, which makes it (much) easier to see shades of gray).

Second, you could try something like scale_linetype_manual(values=c(1,3)) -- this won't completely eliminate the artifacts you're unhappy about, but it might make them a little less prominent since linetype 3 is sparser than linetype 2.

Short of drawing density plots instead (which won't work very well for small samples and may not be familiar to your audience), dodging the positions of the histograms (which is ugly), or otherwise departing from histogram conventions, I can't think of a better solution.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453