2

I would like to use do to make multiple ggplots based on a grouped data frame but make an alteration to the plot, namely reversing the y-axis if a column contains a particular value.

I modelled my approach after Hadley's answer to this question: dplyr::do() requires named function?

The problem i'm having is getting the gg object into the data frame to return it, how do I manually what do did automatically in my working example below and 'wrap' the gg object in somthing that can be placed into a data frame?

df <-   data.frame( position=rep(seq(0,99),2),
                    strand=c(rep("-",100),rep("+",100)),
                    score=rnorm(200),
                    gene=c(rep("alpha",100),rep("beta",100))
        )

This works fine:

plots <- df %>% 
    group_by(gene) %>%
    do(plot=
        ggplot(.,aes(position,score)) +
            geom_point()
    )
plots   

Result:

# A tibble: 2 x 2
  gene  plot    
* <fct> <list>  
1 alpha <S3: gg>
2 beta  <S3: gg>

This does not:

plots <- df %>% 
    group_by(gene) %>%
    do({
        plot <- ggplot(.,aes(position,score)) +
            geom_point()

        if (all(.$strand=="-")) {
            plot <- plot + scale_y_reverse()
        }
        data.frame(., plot) ##!! <<< how to get the ggplot object into a data frame
    })
plots

Fails with the error:

Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : 
  cannot coerce class "c("gg", "ggplot")" to a data.frame
Richard J. Acton
  • 885
  • 4
  • 17

2 Answers2

3

I don't think you need the return value to be a frame. Try this:

plots <- df %>% 
    group_by(gene) %>%
    do(plot= {
        p <- ggplot(.,aes(position,score)) +
            geom_point()
        if (all(.$strand == "-")) p <- p + scale_y_reverse()
        p
    })
plots
# Source: local data frame [2 x 2]
# Groups: <by row>
# # A tibble: 2 x 2
#   gene  plot    
# * <fct> <list>  
# 1 alpha <S3: gg>
# 2 beta  <S3: gg>

I think one issue is that your conditional logic is fine but you did not name the block within do(...).

You can view one of them with:

plots$plot[[1]]

sample plot

If you want to dump all plots (e.g., in a markdown document), just do plots$plot and they will be cycled through rather quickly (not as useful on the console).

Community
  • 1
  • 1
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thanks, @r2evans - I thought I had tried naming the block in my original code before i put together the toy example for SO must have introduced a different error while editing. I would still be interested to know if there is a way of directly putting a ggplot object into a data frame as that seems like it might be useful at some point. – Richard J. Acton Oct 05 '18 at 15:26
  • 1
    @RichardJ.Acton Using a nested data frame is a good way to put a ggplot object into a data frame. The key is to use a list-column. – acylam Oct 05 '18 at 15:28
  • Further, RichardJ, a gg object cannot be converted into a simple frame. It might be convertible into a nested frame, but this would likely break the functionality. It might be informative to run `str(plots$plot[[1]])` to see what a single ggplot object resembles; in this example, it's a 9-element list, elements include a single-element list; a 100-row data.frame; a list of functions; and an environment. Not easily converted into a single/simple frame. Keeping a list-column is the way to go (and works well, imo). – r2evans Oct 05 '18 at 15:55
  • 1
    Thanks @r2evans, I think the key concept is the list columns - I was wondering how `do` 'packaged' the (as you point out) quite complex gg objects so that you get a data frame with a 'pointer' to the S3 object as a cell in the data frame. I was lacking the terminology and can now do some reading on list columns. – Richard J. Acton Oct 05 '18 at 16:57
2

We can use a nested data frame instead of do:

library(ggplot2)
library(tidyverse)

plots <- df %>%
  group_by(gene) %>%
  nest() %>%
  mutate(plots = data %>% map(~{
    plot <- ggplot(.,aes(position,score)) +
      geom_point()

    if (all(.$strand=="-")) {
      plot <- plot + scale_y_reverse()
    }
    return(plot)
  })) %>%
  select(-data) 

Output:

# A tibble: 2 x 2
  gene  plots   
  <fct> <list>  
1 alpha <S3: gg>
2 beta  <S3: gg>

enter image description here enter image description here

acylam
  • 18,231
  • 5
  • 36
  • 45