0

A previous question describes the problem of that the location for missing bars is not preserved in ggplot2. The accepted solution was then to add zeros to complete the missing values. However, in some cases this could be not desirable because zero height bars can be interpreted as bars where the value is zero, instead of missing values. Next I present an example to better describe the question.

Consider the following R code:

library(ggplot2)

data = data.frame(
  x = c("s", "s", "s", "r", "r", "r", "s", "s", "r", "r"),
  y = c(1, 2, 2, 3, 2, 1, 3, 1, 2, 1),
  z = c("A", "B", "C", "A", "B", "C", "A", "C", "A", "C"),
  group = c("group 1", "group 1", "group 1", "group 1", "group 1", "group 1",
            "group 2", "group 2", "group 2", "group 2")
)

ggplot(data, aes(x=x, y=y, fill=z)) + 
  facet_wrap(~group, ncol = 3) +
  stat_summary(fun=mean, geom="bar", position=position_dodge(preserve = "single"), color="grey70")

This code generates the following plot:

enter image description here

Observe that the green bar for variable z = B does not appear in group 2 because there is no data for variable z = B in group 2. I would like to have a space between the red and blue bars in group 2 to make emphasis on the missed data. Now the space is after the blue bar.

We can set the missing values with 0 to separate bars A and B. That is:

data <- rbind(data, list("s", 0, "B", "group 2"))
data <- rbind(data, list("r", 0, "B", "group 2"))

The problem is that (as is shown below) this solution draws bars with height zero. This is confusing because one may think that there is some data with value zero.

enter image description here

I would prefer having no bars and preserving the space. Moreover, I would like to write a vertical text "no data" over the missing bar. Is this possible?

Daniel Hernández
  • 1,279
  • 8
  • 15

1 Answers1

0

If you tidyr::complete the data going in to ggplot2, it will fill the missing combinations with NA, which will be safely empty in the plot.

# library(tidyr)
tidyr::complete(data, group, x, z)
# # A tibble: 12 x 4
#    group   x     z         y
#    <chr>   <chr> <chr> <dbl>
#  1 group 1 r     A         3
#  2 group 1 r     B         2
#  3 group 1 r     C         1
#  4 group 1 s     A         1
#  5 group 1 s     B         2
#  6 group 1 s     C         2
#  7 group 2 r     A         2
#  8 group 2 r     B        NA
#  9 group 2 r     C         1
# 10 group 2 s     A         3
# 11 group 2 s     B        NA
# 12 group 2 s     C         1

ggplot(tidyr::complete(data, group, x, z), aes(x=x, y=y, fill=z)) + 
  facet_wrap(~group, ncol = 3) +
  stat_summary(fun=mean, geom="bar", position=position_dodge(preserve = "single"), color="grey70")
# Warning: Removed 2 rows containing non-finite values (stat_summary).

add na.rm=TRUE to the stat_summary call to hide the warning.

ggplot2, the missing groups have no bar, not a zero bar

(Unfortunately, this does not keep the empty space where it should be.)

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    The problem of filling the missed values with `NA` is that bars A and C come together. The location for B is not preserved as in the second plot in the question. – Daniel Hernández Jan 29 '21 at 19:47
  • Yeha, ergo my comment in the answer about that. I'm surprised that not even `factor`s safe-guarded that. – r2evans Jan 29 '21 at 19:56
  • So, do you think that the answer is that doing keeping the space without using zeros is impossible with the current state of the ggplot2 library? – Daniel Hernández Jan 30 '21 at 07:41
  • Unfortunately, I cannot think of an alternative. If you add the "no data" text, I believe it will help mitigate the problem where "0" shows a small line. If it were me, I would keep the 0s and their lines (but I don't know what your real data is, so I may not have the full picture). – r2evans Jan 30 '21 at 12:56