Preserve location of missing columns in bar plots without introducing zeros

Question

A previous question describes the problem of that the location for missing bars is not preserved in ggplot2. The accepted solution was then to add zeros to complete the missing values. However, in some cases this could be not desirable because zero height bars can be interpreted as bars where the value is zero, instead of missing values. Next I present an example to better describe the question.

Consider the following R code:

library(ggplot2)

data = data.frame(
  x = c("s", "s", "s", "r", "r", "r", "s", "s", "r", "r"),
  y = c(1, 2, 2, 3, 2, 1, 3, 1, 2, 1),
  z = c("A", "B", "C", "A", "B", "C", "A", "C", "A", "C"),
  group = c("group 1", "group 1", "group 1", "group 1", "group 1", "group 1",
            "group 2", "group 2", "group 2", "group 2")
)

ggplot(data, aes(x=x, y=y, fill=z)) + 
  facet_wrap(~group, ncol = 3) +
  stat_summary(fun=mean, geom="bar", position=position_dodge(preserve = "single"), color="grey70")

This code generates the following plot:

Observe that the green bar for variable z = B does not appear in group 2 because there is no data for variable z = B in group 2. I would like to have a space between the red and blue bars in group 2 to make emphasis on the missed data. Now the space is after the blue bar.

We can set the missing values with 0 to separate bars A and B. That is:

data <- rbind(data, list("s", 0, "B", "group 2"))
data <- rbind(data, list("r", 0, "B", "group 2"))

The problem is that (as is shown below) this solution draws bars with height zero. This is confusing because one may think that there is some data with value zero.

I would prefer having no bars and preserving the space. Moreover, I would like to write a vertical text "no data" over the missing bar. Is this possible?

r2evans · Answer 1 · 2021-01-29T19:45:48.803

0

If you tidyr::complete the data going in to ggplot2, it will fill the missing combinations with NA, which will be safely empty in the plot.

# library(tidyr)
tidyr::complete(data, group, x, z)
# # A tibble: 12 x 4
#    group   x     z         y
#    <chr>   <chr> <chr> <dbl>
#  1 group 1 r     A         3
#  2 group 1 r     B         2
#  3 group 1 r     C         1
#  4 group 1 s     A         1
#  5 group 1 s     B         2
#  6 group 1 s     C         2
#  7 group 2 r     A         2
#  8 group 2 r     B        NA
#  9 group 2 r     C         1
# 10 group 2 s     A         3
# 11 group 2 s     B        NA
# 12 group 2 s     C         1

ggplot(tidyr::complete(data, group, x, z), aes(x=x, y=y, fill=z)) + 
  facet_wrap(~group, ncol = 3) +
  stat_summary(fun=mean, geom="bar", position=position_dodge(preserve = "single"), color="grey70")
# Warning: Removed 2 rows containing non-finite values (stat_summary).

add na.rm=TRUE to the stat_summary call to hide the warning.

(Unfortunately, this does not keep the empty space where it should be.)

edited Jan 29 '21 at 19:45

answered Jan 29 '21 at 19:40

r2evans

141,215
6
77
149

1

The problem of filling the missed values with `NA` is that bars A and C come together. The location for B is not preserved as in the second plot in the question. – Daniel Hernández Jan 29 '21 at 19:47
Yeha, ergo my comment in the answer about that. I'm surprised that not even `factor`s safe-guarded that. – r2evans Jan 29 '21 at 19:56
So, do you think that the answer is that doing keeping the space without using zeros is impossible with the current state of the ggplot2 library? – Daniel Hernández Jan 30 '21 at 07:41
Unfortunately, I cannot think of an alternative. If you add the "no data" text, I believe it will help mitigate the problem where "0" shows a small line. If it were me, I would keep the 0s and their lines (but I don't know what your real data is, so I may not have the full picture). – r2evans Jan 30 '21 at 12:56

Preserve location of missing columns in bar plots without introducing zeros

1 Answers1