0

I am trying to call a plotting function on subgroups of a data.frame with dplyr::do(), producing one figure (ggplot object) per subgroup. I want the title of each figure based on the grouping variable. To do this, my function needs to know what the grouping variable is.

Currently, what gets passed to do() as . is an object of class tbl_df and data.frame. Without explicitly passing it as a separate variable, is there a way to inspect the data.frame directly to learn what was the grouping variable(s) is/are?

The solutions posted here calls for explicitly passing (each of) the grouping variables as an additional argument to the function. I'm wondering if there is a a more, elegant and general solution that is scaleable to varying numbers of grouping variables. While in this specific instance i'm interested in plotting, there are other other use cases where I want to know how the subgroups are defined from within the function called on each subgroup.

I don't want to want to guess by looking for columns where the length(unique(col)) == 1 because that is going to lead to lots of false positives with my data.

Is there an elegant way to do this?

Here is some sample code to get started.

library(ggplot2)
my_plot <- function(df) {
  subgroup_name <- "" # ??
  ggplot(aes(cty, hwy)) + geom_point() +
    ggtitle(subgroup_name)
}

 mpg %>%
   group_by(manufacturer) %>%
   do(my_plots = my_plot(.))
t-kalinowski
  • 1,420
  • 11
  • 21
  • You have tried `facet_wrap` already, right? If you really need separate plots, you could `lapply` your plotting function across your data.frame `split` by the grouping variables, though there are certainly other approaches. – alistaire May 11 '16 at 17:04

1 Answers1

2

I don't think its possible to do this without passing the names of the grouping variable(s) into the function (I think the grouping variable "vars" attribute is lost after splitting the grouped_df data.frame, before executing the "do"). Here's an alternative solution that requires defining the grouping variable(s) in a vector before applying the dplyr group_by %>% do chain:

library(ggplot2)
library(dplyr)

my_plot <- function(df, group_vars) {

    # get plot name from value(s) in grouping variable(s)
    subgroup_name <- paste(df[1, group_vars], collapse = " ")

    ggplot(data = df, aes(cty, hwy)) + geom_point() + ggtitle(subgroup_name)

}


group1 <- "manufacturer"
plots1 <- 
    mpg %>% 
    group_by_(.dots = group1) %>%
    do(my_plots = my_plot(., group1))
plots1$my_plots[1]

enter image description here

group2 <- c("manufacturer", "year")
plots2 <- 
    mpg %>% 
    group_by_(.dots = group2) %>%
    do(my_plots = my_plot(., group2))
plots2$my_plots[2]

enter image description here

Lorenz D
  • 576
  • 4
  • 5