4

Suppose I have the following function

SlowFunction = function(vector){
  return(list(
    mean =mean(vector),
    sd  = sd(vector)
    ))
  }

And I would like to use dplyr:summarise to write the results to a dataframe:

iris %>% 
  dplyr::group_by(Species) %>% 
  dplyr::summarise(
    mean = SlowFunction(Sepal.Length)$mean,
    sd   = SlowFunction(Sepal.Length)$sd
    )

Does anyone have a suggestion how I can do this by calling "SlowFunction" once instead of twice? (In my code "SlowFunction" is a slow function that I have to call many times.) Without splitting "SlowFunction" in two parts of course. So actually I would like to somehow fill multiple columns of a dataframe in one statement.

Frank
  • 541
  • 1
  • 6
  • 18

4 Answers4

4

Without changing your current SlowFunction one way is to use do

library(dplyr)

iris %>% 
   group_by(Species) %>% 
   do(data.frame(SlowFunction(.$Sepal.Length)))

#  Species     mean    sd
#  <fct>      <dbl> <dbl>
#1 setosa      5.01 0.352
#2 versicolor  5.94 0.516
#3 virginica   6.59 0.636

Or with group_split + purrr::map_dfr

bind_cols(Species = unique(iris$Species), iris %>%
     group_split(Species) %>%
     map_dfr(~SlowFunction(.$Sepal.Length)))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
3

An option is to use to store the output of SlowFunction in a list column of data.frames and then to use unnest

iris %>%
    group_by(Species) %>%
    summarise(res = list(as.data.frame(SlowFunction(Sepal.Length)))) %>%
    unnest()
## A tibble: 3 x 3
#  Species     mean    sd
#  <fct>      <dbl> <dbl>
#1 setosa      5.01 0.352
#2 versicolor  5.94 0.516
#3 virginica   6.59 0.636
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Thanks, I've compared the answers and this one is in my case by far the fastest. It is 2x faster than using "do" and 2.5x faster than using group_map! – Frank Apr 12 '19 at 23:36
3

We can use group_map if you are using dplyr 0.8.0 or later. The output from SlowFunction needs to be converted to a data frame.

library(dplyr)

iris %>% 
  group_by(Species) %>% 
  group_map(~SlowFunction(.x$Sepal.Length) %>% as.data.frame())
# # A tibble: 3 x 3
# # Groups:   Species [3]
#   Species     mean    sd
#   <fct>      <dbl> <dbl>
# 1 setosa      5.01 0.352
# 2 versicolor  5.94 0.516
# 3 virginica   6.59 0.636
www
  • 38,575
  • 12
  • 48
  • 84
3

We can change the SlowFunction to return a tibble and

SlowFunction = function(vector){
  tibble(
     mean =mean(vector),
      sd  = sd(vector)
     )
   }

and then unnest the summarise output in a list

iris %>% 
    group_by(Species) %>% 
    summarise(out = list(SlowFunction(Sepal.Length))) %>%
    unnest
# A tibble: 3 x 3
#  Species     mean    sd
#  <fct>      <dbl> <dbl>
#1 setosa      5.01 0.352
#2 versicolor  5.94 0.516
#3 virginica   6.59 0.636
akrun
  • 874,273
  • 37
  • 540
  • 662