2

Suppose we start with the following:

library(dplyr)
library(magrittr)
library(tibble)

set.seed(123)

tbl <- data_frame(value=rnorm(100), class=rep(LETTERS[1:5], each=20))

I'd like to write a function summarize_means(data, values, groupby) which, given tbl, "value", and "class", returns the same output as the following code:

tbl %>%
    group_by(class) %>%
    summarise(mean(value))

My first attempt was:

summarise_means <- function(data, values, groupby) {
  data %>%
    group_by(groupby) %>%
    summarise(mean(values))
}

Which, of course, failed with

Error: unknown variable to group by : groupby 

After a bit of digging, I determined that I ought to be using the group_by_ and summarize_ functions, but I suspect that I am using them incorrectly here as this still doesn't work:

summarise_means <- function(data, values, groupby) {
  data %>%
    group_by_(groupby) %>%
    summarise_(mean(values))
}

When I call summarise_means(tbl, 'value', 'class'), I get:

# A tibble: 5 x 2
  class NA_real_
  <chr>    <dbl>
1     A       NA
2     B       NA
3     C       NA
4     D       NA
5     E       NA
Warning message:
In mean.default(values) : argument is not numeric or logical: returning NA

I don't really understand what's going wrong here. Any help is greatly appreciated!

crf
  • 1,810
  • 3
  • 15
  • 23
  • Could also use `summarise_(interp(~ mean(var), var = as.name(values)))` from the lazyval package. Then your function will work with your current input – David Arenburg Jul 18 '16 at 20:20

1 Answers1

0

You need to pass the function along with the argument:

summarise_means <- function(data, values, groupby) {
            data %>%
                    group_by_(groupby) %>%
                    summarise_(Mean = values)
    }

summarise_means(tbl, 'mean(value)', 'class')

# A tibble: 5 x 2
  class        Mean
  <chr>       <dbl>
1     A  0.14162380
2     B -0.05125716
3     C  0.10648523
4     D -0.11991706
5     E  0.37509474
Sumedh
  • 4,835
  • 2
  • 17
  • 32