ggplot2: forcing space for empty second-level category

Question

I'm trying to keep "empty space" for multi-level grouped boxplots.

set.seed(42)
n <- 100
dat <- data.frame(x=runif(n),
                  cat1=sample(letters[1:4], size=n, replace=TRUE),
                  cat2=sample(LETTERS[1:3], size=n, replace=TRUE))
ggplot(dat, aes(cat1, x)) + geom_boxplot(aes(fill=cat2))

If I force one of the groups to be empty:

dat <- subset(dat, ! (cat1 == 'b' & cat2 == 'B'))
table(dat$cat1, dat$cat2)
##    
##      A  B  C
##   a  9  9  7
##   b  8  0  5
##   c 13 11  6
##   d 11 10  5
ggplot(dat, aes(cat1, x)) + geom_boxplot(aes(fill=cat2))

The second group, "b", is now expanded to fill the space. What I'd like is:

SO 9818835 (forcing an empty level to appear) works fine on the top level, but I can't figure out how to get it to work for a second level of categories. in scale_x_discrete(...), I tried setting:

breaks=letters[1:4]
breaks=LETTERS[1:3]
breaks=list(letters[1:4], LETTERS[1:3]) (a stab)
breaks=NULL
breaks=func where func <- function(x, ...) { browser(); 1; } in order to troubleshoot; it only offered letters[1:4] and never prompted for the second level

Using interactions(letters[1:4], LETTERS[1:3]) still does not leave empty space. I tried a workaround by injecting an out-of-bounds x value and forcing it off the screen with scale_y_continuous(limits), but ggplot2 is too smart for me and closes the gap again.

Are there elegant (i.e., "correct" in ggplot2 mechanisms) solutions?

How elegant does it need to be? Just setting `x` to zero for these records seems to create something that looks quite reasonable. `dat <- dat %>% mutate(x = ifelse(cat1 == 'b' & cat2 == 'B', 0, x))` — akhmed, Oct 21 '15 at 22:12
That's elegant programmatically (and I had already tried it, sans the `scale_y_continuous(limits)` step), but I'm a little OCD when it comes to my visualizations: I'll always stare at the distracting line at the bottom of the plot. — r2evans, Oct 21 '15 at 22:16
Plus there's the statistical difference between "a line means no data" and "a line indicates a single data point of value 0". — r2evans, Oct 21 '15 at 22:18
I see. So the main issue is the extra line. Why not use `coord_cartesian` then instead of `scale_y_continuous`? — akhmed, Oct 21 '15 at 22:24

akhmed · Accepted Answer · 2015-10-21T22:30:30.937

9

Could coord_cartesian be a solution that you are looking for?

It will zoom in and will not try to "outsmart" the data like scale_y_continuous

library(dplyr)
library(ggplot2)

set.seed(42)
n <- 100
dat <- data.frame(x=runif(n),
                  cat1=sample(letters[1:4], size=n, replace=TRUE),
                  cat2=sample(LETTERS[1:3], size=n, replace=TRUE))

LARGE_VALUE <- 2

dat <- dat %>%
  mutate(x = ifelse(cat1 == 'b' & cat2 == 'B', 
                    LARGE_VALUE,
                    x))

ggplot(dat, aes(cat1, x)) + 
  geom_boxplot(aes(fill=cat2)) + 
  coord_cartesian(ylim = c(0,1))

edited Oct 21 '15 at 22:30

answered Oct 21 '15 at 22:25

akhmed

3,536
2
25
35

Yup, that does what I need. I'm a little surprised that `drop=FALSE` doesn't do what I think it would do, but then again I need to work more to fully grok `ggplot2`. Thanks. – r2evans Oct 21 '15 at 22:30

ggplot2: forcing space for empty second-level category

1 Answers1

Linked