4

I'm struggling with ggplot (I always do). There are a number of very similar questions about forcing ggplot to include zero value categories in legends - here and here (for example). BUT I (think I) have a slightly different requirement to which all my mucking about with scale_x_discrete and scale_fill_manual has not helped.

Requirement: As you can see; the right-hand plot has no data in the TM=5 category - so is missing. What I need is for that right plot to have category 5 shown on the axis but obviously with no points or box.

enter image description here

Current Plot Script:

#data
plotData <- data.frame("TM"    = c(3,2,3,3,3,4,3,2,3,3,4,3,4,3,2,3,2,2,3,2,3,3,3,2,3,1,3,2,2,4,4,3,2,3,4,2,3),
                       "Score" = c(5,4,4,4,3,5,5,5,5,5,5,3,5,5,4,4,5,4,5,4,5,4,5,4,4,4,4,4,5,4,4,5,3,5,5,5,5))
#vars
xTitle <- bquote("T"["M"])
v.I    <- plotData$TM
depVar <- plotData$Score

#plot
p <- ggplot(plotData, aes_string(x=v.I,y=depVar,color=v.I)) +
  geom_point() +
  geom_jitter(alpha=0.8, position = position_jitter(width = 0.2, height = 0.2)) +
  geom_boxplot(width=0.75,alpha=0.5,aes_string(group=v.I)) +
  theme_bw() +
  labs(x=xTitle) +
  labs(y=NULL) +
  theme(legend.position='none', 
        axis.text=element_text(size=10, face="bold"),
        axis.title=element_text(size=16))

Attempted Solutions:

  1. drop=False to scales (suggested by @Jarretinha here) totally borks margins and x-axis labels

    > plot + scale_x_discrete(drop=FALSE) + scale_fill_manual(drop=FALSE)

enter image description here

  1. Following logic from here and manually setting the labels in scale_fill_manual does nothing and results in the same right-hand plot from example above.

    > p + scale_fill_manual(values = c("red", "blue", "green", "purple", "pink"), labels = c("Cat1", "Cat2", "Cat3", "Cat4", "Cat5"), drop=FALSE)

  2. Playing with this logic and trying something with scale_x_discrete results in a change to category names on x-axis but the fifth is still missing AND the margins (as attempt 1) are borked again. BUT apparent that scale_x_discrete is important and NOT the whole answer

    > p + scale_x_discrete(limits = c("Cat1", "Cat2", "Cat3", "Cat4", "Cat5"), drop=FALSE)

enter image description here

ANSWER for above example courtesy of input from @Bouncyball & @aosmith

#data
plotData    <- data.frame("TM"    = c(3,2,3,3,3,4,3,2,3,3,4,3,4,3,2,3,2,2,3,2,3,3,3,2,3,1,3,2,2,4,4,3,2,3,4,2,3),
                       "Score" = c(5,4,4,4,3,5,5,5,5,5,5,3,5,5,4,4,5,4,5,4,5,4,5,4,4,4,4,4,5,4,4,5,3,5,5,5,5))
plotData$TM <- factor(plotData$TM, levels=1:5) # add correct (desired number of factors to input data)

#vars
xTitle <- bquote("T"["M"])
v.I    <- plotData$TM
depVar <- plotData$Score
myPalette <- c('#5c9bd4','#a5a5a4','#4770b6','#275f92','#646464','#002060')

#plot
ggplot(plotData, aes_string(x=v.I,y=depVar,color=v.I)) +
  geom_jitter(alpha=0.8, position = position_jitter(width = 0.2, height = 0.2)) +
  geom_boxplot(width=0.75,alpha=0.5,aes_string(group=v.I)) +
  scale_colour_manual(values = myPalette, drop=F) +  # new line added here
  scale_x_discrete(drop=F) + # new line added here
  theme_bw() +
  labs(x=xTitle) +
  labs(y=NULL) +
  theme(legend.position='none', 
        axis.text=element_text(size=10, face="bold"),
        axis.title=element_text(size=16))

enter image description here

Community
  • 1
  • 1
BarneyC
  • 529
  • 4
  • 17
  • 1
    Notice in your first link that recommends `drop = FALSE`, the x variable is a factor. Your x variable is currently numeric. If you make it a factor and make sure it has all 5 levels of interest (e.g., `plotData$TM = factor(plotData$TM, levels = 1:5)`) you can use the answers you've linked to. – aosmith Aug 23 '16 at 18:02
  • 1
    This is sort of an aside, but I'm very confused by `aes_string(x=v.I,y=depVar,color=v.I)`. `aes_string` is for passing the _names_ of columns in your data frame as strings, but you seem to be mapping raw numeric vectors, even though you passed the data frame in itself. Was that intentional? – joran Aug 23 '16 at 18:03
  • @joran yup intentional. To get a reproducable example I just copied the base script from my much larger .RMD script as these plots are being produced from dynamically created dataframes. Make more sense? – BarneyC Aug 23 '16 at 18:09
  • @aosmith - I sussed the factors thing based on bouncyball's help below and it made ALL the difference! Cheers – BarneyC Aug 23 '16 at 18:11
  • Sort of, except that it sort of makes your example a bit nonsensical. Because my first instinct was to recommend adding explicit factor levels as answered below, but doing that _in your data frame_ would actually **not** solve the problem in the example you presented because you have completely decoupled the data being plotted from your data frame. In the future, it would have been better to do `aes_string(x="TM",y="Score",color= "TM")`. – joran Aug 23 '16 at 18:13

1 Answers1

2

Here's a workaround you could use:

# generate dummy data 
set.seed(123)
df1 <- data.frame(lets = sample(letters[1:4], 20, replace = T),
                  y = rnorm(20), stringsAsFactors = FALSE)
# define factor, including the missing category as a level
df1$lets <- factor(df1$lets, levels = letters[1:5])
# make plot
ggplot(df1, aes(x = lets, y = y))+
    geom_boxplot(aes(fill = lets))+
    geom_point(data = NULL, aes(x = 'e', y = 0), pch = NA)+
    scale_fill_brewer(drop = F, palette = 'Set1')+
    theme_bw()

enter image description here

Basically, we plot an "empty" point (i.e. pch = NA) so that the category shows up on the x-axis, but has no visible geom associated with it. We also define our discrete variable, lets as a factor with five levels when only four are present in the data.frame. The missing category is the letter e.

NB: You'll have to adjust the positioning of this "empty" point so that it doesn't skew your y axis.

Otherwise, you could use the result from this answer to avoid having to plot an "empty" point.

# generate dummy data 
set.seed(123)
df1 <- data.frame(lets = sample(letters[1:4], 20, replace = T),
                  y = rnorm(20), stringsAsFactors = FALSE)
# define factor, including the missing category as a level
df1$lets <- factor(df1$lets, levels = letters[1:5])
# make plot
ggplot(df1, aes(x = lets, y = y)) +
    geom_boxplot(aes(fill = lets)) +
    scale_x_discrete(drop = F) +
    scale_fill_brewer(drop = F, palette = 'Set1') +
    theme_bw()

enter image description here

Community
  • 1
  • 1
bouncyball
  • 10,631
  • 19
  • 31
  • BEAUTY! The trick was in factoring the x-axis categories (I was just assuming as integers it would all work) AND that NULL data trick - now I just have to roll this into the plot script to take account of whatever category is actually missing as sometimes its 1, sometimes 5. – BarneyC Aug 23 '16 at 18:13
  • By which it is really obvious. When initialising the plotData df, as per @aosmith's comment just set the correct factor level for the xAxis data. BINGO! – BarneyC Aug 23 '16 at 18:16