Passing column name inside a function using dplyr

Question

I am aware of using lazyeval inside function in order to refer to column names with dplyr but am stuck. In general, when creating a function that uses dplyr which also references column names from function arguments, what is the most idiomatic way to achieve that? Thanks.

 library(lazyeval)

 ## Create data frame
 df0 <- data.frame(x=rnorm(100), y=runif(100))

 ##########################################
 ## Sample mean; this way works
 ##########################################
 df0 %>%
   filter(!is.na(x)) %>%
   summarize(mean=mean(x))

 ##########################################
 ## Sample mean via function; does not work
 ##########################################
 dfSummary2 <- function(df, var_y) { 
   p <- df %>%
        filter(!is.na(as.name(var_y))) %>%
        summarize(mean=mean(as.name(var_y)))
   return(p)
}

dfSummary(df0, "x")
#   mean
# 1   NA
# Warning message:
# In mean.default("x") : argument is not numeric or logical: returning NA

 ##########################################
 ## Sample mean via function; also does not work
 ##########################################
 dfSummary <- function(df, var_y) {
   p <- df %>%
        filter(!is.na(var_y)) %>%
        summarize(mean=mean(var_y))
  return(p)
}

 dfSummary(df0, "x")
 #   mean
 # 1   NA
 # Warning message:
 # In mean.default("x") : argument is not numeric or logical: returning NA

You'll have to use `summarize_` and `filter_` instead. See for example [here](http://stackoverflow.com/questions/41810320/how-to-correctly-use-dplyr-verbs-inside-a-function-definition-in-r). — Axeman, Jan 23 '17 at 22:57
Thank you. Btw, is the use of **lazyeval::interp** required? — DavidH, Jan 23 '17 at 23:06

score 1 · Answer 1 · answered Jan 23 '17 at 23:20

1

The comment to use summarize_ and filter_ is the correct direction if using dplyr and more information is available with vignette("nse").

Although with the given problem, this will provide a function that uses a variable column name without requiring dplyr

dfSummary <- function(df, var_y) {
 mean(df[[var_y]], na.rm = TRUE) 
}

dfSummary(df0, "x")
[1] 0.105659

dfSummary(df0, "y")
[1] 0.4948618

answered Jan 23 '17 at 23:20

manotheshark

4,297
17
30

Thank you. I was actually aware of this approach, but since my script extensively uses **dplyr**, I wanted to retain its functions. – DavidH Jan 23 '17 at 23:27

score 0 · Answer 2 · answered Jul 03 '22 at 17:37

0

summarize_ and filter_ are now deprecated for info. It's better to use

dfSummary <- function(df, var_y) {
   p <- df %>%
        filter(!is.na(var_y)) %>%
        summarize(mean=mean({{var_y}}))
  return(p)
}

answered Jul 03 '22 at 17:37

Julien

1,613
1
10
26

Passing column name inside a function using dplyr

2 Answers2