UPDATE July 2020:
dplyr
1.0 has changed pretty much everything about this question as well as all of the answers. See the dplyr
programming vignette here:
https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html
The new way to refer to columns when their identifier is stored as a character vector is to use the .data
pronoun from rlang
, and then subset as you would in base R.
library(dplyr)
key <- "v3"
val <- "v2"
drp <- "v1"
df <- tibble(v1 = 1:5, v2 = 6:10, v3 = c(rep("A", 3), rep("B", 2)))
df %>%
select(-matches(drp)) %>%
group_by(.data[[key]]) %>%
summarise(total = sum(.data[[val]], na.rm = TRUE))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 2 x 2
#> v3 total
#> <chr> <int>
#> 1 A 21
#> 2 B 19
If your code is in a package function, you can @importFrom rlang .data
to avoid R check notes about undefined globals.
ORIGINAL QUESTION:
I want to refer to an unknown column name inside a summarise
. The standard evaluation functions introduced in dplyr 0.3
allow column names to be referenced using variables, but this doesn't appear to work when you call a base
R function within e.g. a summarise
.
library(dplyr)
key <- "v3"
val <- "v2"
drp <- "v1"
df <- data_frame(v1 = 1:5, v2 = 6:10, v3 = c(rep("A", 3), rep("B", 2)))
The df looks like this:
> df
Source: local data frame [5 x 3]
v1 v2 v3
1 1 6 A
2 2 7 A
3 3 8 A
4 4 9 B
5 5 10 B
I want to drop v1, group by v3, and sum v2 for each group:
df %>% select(-matches(drp)) %>% group_by_(key) %>% summarise_(sum(val, na.rm = TRUE))
Error in sum(val, na.rm = TRUE) : invalid 'type' (character) of argument
The NSE version of select()
works fine, since it can match a character string. The SE version of group_by()
works fine, since it can now accept variables as arguments and evaluate them. However, I haven't found a way to achieve similar results when using base R functions inside dplyr
functions.
Things that don't work:
df %>% group_by_(key) %>% summarise_(sum(get(val), na.rm = TRUE))
Error in get(val) : object 'v2' not found
df %>% group_by_(key) %>% summarise_(sum(eval(as.symbol(val)), na.rm = TRUE))
Error in eval(expr, envir, enclos) : object 'v2' not found
I've checked out several related questions, but none of the proposed solutions have worked for me so far.