28

With a data frame, I'm using dplyr to aggregate some column like below.

> data <- data.frame(a=rep(1:2,3), b=c(6:11))
> data
  a  b
1 1  6
2 2  7
3 1  8
4 2  9
5 1 10
6 2 11
> data %>% group_by(a) %>% summarize(tot=sum(b))
# A tibble: 2 x 2
      a   tot
  <int> <int>
1     1    24
2     2    27

This is perfect. However I want to create a re-usable function for this such that a column name can be passed as argument.

Looking at answers to related questions like here, I tried the following.

sumByColumn <- function(df, colName) {
  df %>%
  group_by(a) %>%
  summarize(tot=sum(colName))
  df
}

However I'm not able to get it working.

> sumByColumn(data, "b")

 Error in summarise_impl(.data, dots) : 
  Evaluation error: invalid 'type' (character) of argument. 

> sumByColumn(data, b)

 Error in summarise_impl(.data, dots) : 
  Evaluation error: object 'b' not found. 
> 
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
user3206440
  • 4,749
  • 15
  • 75
  • 132
  • 7
    You should consult [this blog post about programming with __`dplyr`__](http://dplyr.tidyverse.org/articles/programming.html) – bouncyball Jan 02 '18 at 13:54

4 Answers4

35

This can work using the latest dplyr syntax (as can be seen on github):

library(dplyr)
library(rlang)
sumByColumn <- function(df, colName) {
  df %>%
    group_by(a) %>%
    summarize(tot = sum(!! sym(colName)))
}

sumByColumn(data, "b")
## A tibble: 2 x 2
#      a   tot
#  <int> <int>
#1     1    24
#2     2    27

And an alternative way of specifying b as a variable:

library(dplyr)
sumByColumn <- function(df, colName) {
  myenc <- enquo(colName)
  df %>%
    group_by(a) %>%
    summarize(tot = sum(!!myenc))
}

sumByColumn(data, b)
## A tibble: 2 x 2
#      a   tot
#  <int> <int>
#1     1    24
#2     2    27
LyzandeR
  • 37,047
  • 12
  • 77
  • 87
  • this works ... However if I were to add `filter( !!myenc > 7 ) ` before `group_by` it doesn't return any rows. What would be the right way to specify the column name inside `filter()` ? – user3206440 Jan 03 '18 at 01:50
  • 2
    This is part of the documentation. Instead of `!!` (which is a convenience function and which is not working with logical vectors), use `UQ` which is the proper function. i.e. `filter(UQ(myenc) > 7)`. Then it works fine. – LyzandeR Jan 03 '18 at 09:21
13

We can use {{}}:

library(dplyr)

sumByColumn <- function(df, colName) {
  df %>%
    group_by(a) %>%
    summarize(tot=sum({{colName}}))
}

sumByColumn(data, b)

#      a   tot
#  <int> <int>
#1     1    24
#2     2    27
AndrewGB
  • 16,126
  • 5
  • 18
  • 49
6

dplyr now also provides helper functions (summarise_at, which accepts arguments vars, funs) for this

sumByColumn <- function(df, colName) {
  df %>%
    group_by(a) %>%
    summarize_at(vars(colName), funs(tot = sum))
}

provides the same answer

# A tibble: 2 x 2
      # a   tot
  # <int> <int>
# 1     1    24
# 2     2    27
CPak
  • 13,260
  • 3
  • 30
  • 48
6

We can use the .data pronoun.

library(dplyr)

sumByColumn <- function(df, colName) {
  df %>%
    group_by(a) %>%
    summarise(tot = sum(.data[[colName]]))
}

sumByColumn(data, "b")

#      a   tot
#* <int> <int>
#1     1    24
#2     2    27
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213