3

I have a plot generated by ggplot2, which contains two legends. The placing of the legends is not ideal, so I would like to adjust them. I've been trying to imitate the method shown in the answer to "How do I position two legends independently in ggplot". The example shown in that answer works. However, I can't get the method described to work for my situation.

I'm using R 2.15.3 (2013-03-01), ggplot2_0.9.3.1, lattice_0.20-13, gtable_0.1.2, gridExtra_0.9.1 on Debian squeeze.

Consider the plot generated by minimal.R. This is similar to my actual plot.

########################
minimal.R
########################

get_stat <- function()
  {
    n = 20
    q1 = qnorm(seq(3, 17)/20, 14, 5)
    q2 = qnorm(seq(1, 19)/20, 65, 10)
    Stat = data.frame(value = c(q1, q2),
      pvalue = c(dnorm(q1, 14, 5)/max(dnorm(q1, 14, 5)), d = dnorm(q2, 65, 10)/max(dnorm(q2, 65, 10))),
      variable = c(rep('true', length(q1)), rep('data', length(q2))))
    return(Stat)
  }

stat_all<- function()
{
  library(ggplot2)
  library(gridExtra)
  stathuman = get_stat()
  stathuman$dataset = "human"
  statmouse = get_stat()
  statmouse$dataset = "mouse"
  stat = merge(stathuman, statmouse, all=TRUE)
  return(stat)
}

simplot <- function()
  {
    Stat = stat_all()
    Pvalue = subset(Stat, variable=="true")
    pdf(file = "CDF.pdf", width = 5.5, height = 2.7)
    stat = ggplot() + stat_ecdf(data=Stat, n=1000, aes(x=value, colour = variable)) +
      theme(legend.key = element_blank(), legend.background = element_blank(), legend.position=c(.9, .25), legend.title = element_text(face = "bold")) +
        scale_x_continuous("Negative log likelihood") +
          scale_y_continuous("Proportion $<$ x") +
            facet_grid(~ dataset, scales='free') +
              scale_colour_manual(values = c("blue", "red"), name="Data type",
                                  labels=c("Gene segments", "Model"), guide=guide_legend(override.aes = list(size = 2))) +
                geom_area(data=Pvalue, aes(x=value, y=pvalue, fill=variable), position="identity", alpha=0.5) +
                  scale_fill_manual(values = c("gray"), name="Pvalue", labels=c(""))
    print(stat)
    dev.off()
  }

simplot()

This results in the following plot. As can be seen, the Data type and Pvalue legends are not well positioned. I modified this code to minimal2.R.

enter image description here

With Version 1, which should put the legend on the top, the code runs without error, but no legend is shown.

EDIT: There are two boxes displayed, one on top of the other. The top one is blank. If I do not set the heights in grid.arrange(), as suggeted by @baptiste, then the legend and the plot are both placed in the bottom box. If I set the height as shown, then I don't see the legend.

EDIT2: It seems the extra blank box was called by grid.newpage, which I copied from the earlier question. I'm not sure why it was there. If I don't use that line, then I just get one box/page.

With Version 2, I get this error.

Error in UseMethod("grid.draw") :
  no applicable method for 'grid.draw' applied to an object of class "c('gg', 'ggplot')"
Calls: simplot -> grid.draw

EDIT: If I use print(plotNew) as suggested by @baptiste, then I get the following error

Error in if (empty(data)) { : missing value where TRUE/FALSE needed 
Calls: simplot ... facet_map_layout -> facet_map_layout.grid -> locate_grid.

I tried to figure out what is going on here, but I could not find much relevant information.

NOTES:

  1. I'm not sure why I'm getting the staircase effect for the empirical CDF. I'm sure there is an obvious explanation. Please enlighten me if you know.

  2. I'm willing to consider alternatives to this code and even ggplot2 for producing this graph, if anyone can suggest alternatives, e.g. matplotlib, which I have never seriously experimented with.

  3. Adding

    print(ggplot_gtable(ggplot_build(stat2)))
    

    to minimal2.R gives me

    TableGrob (7 x 7) "layout": 12 grobs
        z     cells       name                                 grob
    1   0 (1-7,1-7) background       rect[plot.background.rect.186]
    2   1 (3-3,4-4)  strip-top absoluteGrob[strip.absoluteGrob.135]
    3   2 (3-3,6-6)  strip-top absoluteGrob[strip.absoluteGrob.141]
    4   5 (4-4,3-3)     axis-l  absoluteGrob[GRID.absoluteGrob.129]
    5   3 (4-4,4-4)      panel                gTree[GRID.gTree.155]
    6   4 (4-4,6-6)      panel                gTree[GRID.gTree.169]
    7   6 (5-5,4-4)     axis-b  absoluteGrob[GRID.absoluteGrob.117]
    8   7 (5-5,6-6)     axis-b  absoluteGrob[GRID.absoluteGrob.123]
    9   8 (6-6,4-6)       xlab          text[axis.title.x.text.171]
    10  9 (4-4,2-2)       ylab          text[axis.title.y.text.173]
    11 10 (4-4,4-6)  guide-box                    gtable[guide-box]
    12 11 (2-2,4-6)      title            text[plot.title.text.184]
    

    I don't understand this breakdown. Can anyone explain? Does guide-box correspond to the legend, and how does one know this?

Here is the modified version of my code, minimal2.R.

########################
minimal2.R
########################

get_stat <- function()
  {
    n = 20
    q1 = qnorm(seq(3, 17)/20, 14, 5)
    q2 = qnorm(seq(1, 19)/20, 65, 10)
    Stat = data.frame(value = c(q1, q2),
      pvalue = c(dnorm(q1, 14, 5)/max(dnorm(q1, 14, 5)), d = dnorm(q2, 65, 10)/max(dnorm(q2, 65, 10))),
      variable = c(rep('true', length(q1)), rep('data', length(q2))))
    return(Stat)
  }

stat_all<- function()
{
  library(ggplot2)
  library(gridExtra)
  library(gtable)
  stathuman = get_stat()
  stathuman$dataset = "human"
  statmouse = get_stat()
  statmouse$dataset = "mouse"
  stat = merge(stathuman, statmouse, all=TRUE)
  return(stat)
}

simplot <- function()
  {
    Stat = stat_all()
    Pvalue = subset(Stat, variable=="true")
    pdf(file = "CDF.pdf", width = 5.5, height = 2.7)

    ## only include data type legend
    stat1 = ggplot() + stat_ecdf(data=Stat, n=1000, aes(x=value, colour = variable)) +
      theme(legend.key = element_blank(), legend.background = element_blank(), legend.position=c(.9, .25), legend.title = element_text(face = "bold")) +
        scale_x_continuous("Negative log likelihood") +
          scale_y_continuous("Proportion $<$ x") +
            facet_grid(~ dataset, scales='free') +
              scale_colour_manual(values = c("blue", "red"), name="Data type", labels=c("Gene segments", "Model"), guide=guide_legend(override.aes = list(size = 2))) +
                geom_area(data=Pvalue, aes(x=value, y=pvalue, fill=variable), position="identity", alpha=0.5) +
                  scale_fill_manual(values = c("gray"), name="Pvalue", labels=c(""), guide=FALSE)

    ## Extract data type legend
    dataleg <- gtable_filter(ggplot_gtable(ggplot_build(stat1)), "guide-box")

    ## only include pvalue legend
    stat2 = ggplot() + stat_ecdf(data=Stat, n=1000, aes(x=value, colour = variable)) +
      theme(legend.key = element_blank(), legend.background = element_blank(), legend.position=c(.9, .25), legend.title = element_text(face = "bold")) +
        scale_x_continuous("Negative log likelihood") +
          scale_y_continuous("Proportion $<$ x") +
            facet_grid(~ dataset, scales='free') +
              scale_colour_manual(values = c("blue", "red"), name="Data type", labels=c("Gene segments", "Model"), guide=FALSE) +
                geom_area(data=Pvalue, aes(x=value, y=pvalue, fill=variable), position="identity", alpha=0.5) +
                  scale_fill_manual(values = c("gray"), name="Pvalue", labels=c(""))

    ## Extract pvalue legend
    pvalleg <- gtable_filter(ggplot_gtable(ggplot_build(stat2)), "guide-box")

    ## no legends
    stat = ggplot() + stat_ecdf(data=Stat, n=1000, aes(x=value, colour = variable)) +
      theme(legend.key = element_blank(), legend.background = element_blank(), legend.position=c(.9, .25), legend.title = element_text(face = "bold")) +
        scale_x_continuous("Negative log likelihood") +
          scale_y_continuous("Proportion $<$ x") +
            facet_grid(~ dataset, scales='free') +
              scale_colour_manual(values = c("blue", "red"), name="Data type", labels=c("Gene segments", "Model"), guide=FALSE) +
                geom_area(data=Pvalue, aes(x=value, y=pvalue, fill=variable), position="identity", alpha=0.5) +
                  scale_fill_manual(values = c("gray"), name="Pvalue", labels=c(""), guide=FALSE)

    ## Add data type legend: version 1 (data type legend should be on top)
    ## plotNew <- arrangeGrob(dataleg, stat, heights = unit.c(dataleg$height, unit(1, "npc") - dataleg$height), ncol = 1)

    ## Add data type legend: version 2 (data type legend should be somewhere in the interior)
    ## plotNew <- stat + annotation_custom(grob = dataleg, xmin = 7, xmax = 10, ymin = 0, ymax = 4)

    grid.newpage()
    grid.draw(plotNew)
    dev.off()
  }

simplot()
Community
  • 1
  • 1
Faheem Mitha
  • 6,096
  • 7
  • 48
  • 83
  • I am interested in an answer to the question (so independent of the question) but as far as visually could you just label the lines directly rather than use a legend? – Tyler Rinker May 11 '13 at 22:00
  • @TylerRinker: I'm not sure what you have in mind. Can you elaborate? – Faheem Mitha May 11 '13 at 22:02
  • Not that the direct labels package is the only way to accomplish this but something like [this](https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcQ4Qoq19dCE6mjhpmT5BngK6csAWQL__ZwQPredqteXt3uO0j2j). This reduces the amount of searching on the eye, reducing processing demands, meaning the data's the one taking up space in the working memory, not what the colors of the lines mean. – Tyler Rinker May 11 '13 at 22:07
  • @TylerRinker: I see what you mean. Though with two similar plots there would be duplication. In any case, do you know how to do this within ggplot2? – Faheem Mitha May 11 '13 at 22:20
  • with version 2, you end up with a ggplot2, so you can simply `print()` it (grid.draw doesn't know what to do and returns an error). – baptiste May 11 '13 at 22:47
  • as for version 1, I would try it without setting the heights in `grid.arrange()` to start with (I haven't tried your code) – baptiste May 11 '13 at 22:48
  • @baptiste Thanks for the tips. I'll experiment tomorrow. – Faheem Mitha May 11 '13 at 22:49
  • @Faheem Mitha you can use the direct labels package or do the labels by hand creating a separate 4 row data frame as I have detailed in [this blog post](http://trinkerrstuff.wordpress.com/2012/09/01/add-text-annotations-to-ggplot2-faceted-plot/) – Tyler Rinker May 11 '13 at 23:11
  • @baptiste: For version 2, replacing `grid.draw` with `print(plotNew)` gives the following error: `Error in if (empty(data)) { : missing value where TRUE/FALSE needed Calls: simplot ... facet_map_layout -> facet_map_layout.grid -> locate_grid`. For version 1, not setting the height sort of works. It produces two boxes on top of each other, but the top one is empty, and the bottom one contains the legend *and* the plot. – Faheem Mitha May 12 '13 at 08:32

1 Answers1

2

It can be done with grid.arrange and arrangeGrob, but it's a pain to adjust the heights and widths correctly.

grid.arrange(arrangeGrob(dataleg, pvalleg, nrow=1, ncol=2, widths=c(unit(1, "npc"), unit(5, "cm"))), stat, nrow=2, heights=c(unit(.2, "npc"), unit(.8, "npc")))

I usually prefer to make a new plot with an appropriate legend and use this new legend:

 h <- ggplot(data.frame(a=rnorm(10), b=rnorm(10), c=factor(rbinom(10, 1,.5), labels=c("Gene segments", "Model")), d=factor("")), 
        aes(x=a, y=b)) +
   geom_line(aes(color=c), size=1.3) + geom_polygon(aes(fill=d)) +
   scale_color_manual(values=c("blue", "red"), name="Data type") + 
   scale_fill_manual(values="gray", name="P-value") 
 g_legend<-function(a.gplot){
   tmp <- ggplot_gtable(ggplot_build(a.gplot))
   leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
   legend <- tmp$grobs[[leg]]
   return(legend)
 }
 legend <- g_legend(h)

 grid.arrange(stat, legend, nrow=1, ncol=2, widths=c(unit(.8, "npc"), unit(.2, "npc")))
 grid.arrange(legend, stat, nrow=2, ncol=1, heights=c(unit(.2, "npc"), unit(.8, "npc")))