0

I've been struggling with this issue which is quite similar to a question raised here before. Somehow I can't translate the solution given in that question to my own problem.

I start off with making an example data frame:

test.df <- data.frame(col1 = rep(c('a','b'), each=5), col2 = runif(10))
str(test.df)

The following function should create a new data frame with the mean of a "statvar" based on groups of a "groupvar".

test.f <- function(df, groupvar, statvar) {
  df %>% 
    group_by_(groupvar) %>% 
    select_(statvar) %>%
    summarise_(
      avg = ~mean(statvar, na.rm = TRUE)
    )
} 

test.f(df = test.df,
       groupvar = "col1",
       statvar = "col2")

What I would like this to return is a data frame with 2 calculated averages (one for all a values in col1 and one for all b values in col1). Instead I get this:

  col1 avg
1    a  NA
2    b  NA
Warning messages:
1: In mean.default("col2", na.rm = TRUE) :
  argument is not numeric or logical: returning NA
2: In mean.default("col2", na.rm = TRUE) :
  argument is not numeric or logical: returning NA

I find this strange cause I'm pretty sure col2 is numeric:

str(test.df)
'data.frame':   10 obs. of  2 variables:
 $ col1: Factor w/ 2 levels "a","b": 1 1 1 1 1 2 2 2 2 2
 $ col2: num  0.4269 0.1928 0.7766 0.0865 0.1798 ...
Community
  • 1
  • 1
1053Inator
  • 302
  • 1
  • 15

2 Answers2

4
library(lazyeval)
library(dplyr)

test.f <- function(df, groupvar, statvar) {
  test.df %>% 
    group_by_(groupvar) %>% 
    select_(statvar) %>%
    summarise_(
      avg = (~mean(statvar, na.rm = TRUE)) %>%
        interp(statvar = as.name(statvar))
    )
} 

test.f(df = test.df,
       groupvar = "col1",
       statvar = "col2")

Your issue is that "col2" is being substituted for statvar, and the mean("col2") is undefined

bramtayl
  • 4,004
  • 2
  • 11
  • 18
  • This works perfectly, many thanks. So interp() basically says "R, you should see this as a variable and not as a character string"? Yet I'm still a bit puzzled why avg is connected to interp() with the piping symbol %>%. – 1053Inator Oct 04 '15 at 12:33
  • @1053Inator, You could write it as `avg = interp(~mean(statvar, na.rm = TRUE), statvar = as.name(statvar))` without piping – talat Oct 04 '15 at 13:11
  • 1
    interp takes the expresion `~mean(statvar, na.rm = TRUE)` and replaces every time it sees the word statvar with the result of as.name(statvar), i.e., col2. So the expression is transformed to `~mean(col2, na.rm = TRUE)` – bramtayl Oct 04 '15 at 15:37
0

With the soon to be released dplyr 0.6.0, new functionality can help. The new function is UQ(), it unquotes what has been quoted. You are entering statvar as a string like "col1". dplyr has alternate functions that can evaluate regularly as in group_by_ and select_. But for summarise_ the alteration of the string can be ugly as in the above answer. We can now use the regular summarise function and unquote the quoted variable name. For more help on what 'unquote the quoted' means, see this vignette. For now the developer's version has it.

library(dplyr)
test.df <- data.frame(col1 = rep(c('a','b'), each=5), col2 = runif(10))
test.f <- function(df, groupvar, statvar) {
  q_statvar <- as.name(statvar)
  df %>% 
    group_by_(groupvar) %>% 
    select_(statvar) %>%
    summarise(
      avg = mean(!!q_statvar, na.rm = TRUE)
    )
} 

test.f(df = test.df,
       groupvar = "col1",
       statvar = "col2")
# # A tibble: 2 × 2
#     col1       avg
#   <fctr>     <dbl>
# 1      a 0.6473072
# 2      b 0.4282954
Pierre L
  • 28,203
  • 6
  • 47
  • 69