7

I'm trying to use dplyr's count() with a dynamic variable name instead of a column name. Before, I would use count_(), but this is now deprecated. What is the best replacement?

Minimal reproducible example:

library(dplyr)
df <- data.frame(id = 1:10, city = sample(c("London","Paris","Amsterdam"), 10, replace=TRUE))
colname <- "city"

Here's what I've tried:

df %>% count( city )  # desired output (works but isn't dynamic)
df %>% count( !!colname )  # doesn't work, makes it literally "city"
df %>% count( vars(colname) )  # doesn't work
df %>% count( eval(colname) )  # doesn't work either
df %>% count( eval(parse(text=colname)) )  # works, but is not 'dplyr' ?
df %>% count( eval(sym(colname)) )  # works, but using `sym` from 'rlang'
df %>% count( !!as.name(colname) )  # works, but using `as.name` from 'base'
df %>% count_( colname )  # works, but is deprecated

Not sure whether any of the above is the preferred method, or whether it's something altogether different?

Thanks in advance!

PS. I found the as.name() solution here.

rvrvrv
  • 881
  • 3
  • 9
  • 29

2 Answers2

6

In the development version of dplyr, which will soon be released on CRAN as version 1.0, the across function seems like the appropriate choice:

df %>% count(across(colname))

In the current CRAN version of dplyr, the group_by_at() function can take a string, so you could do:

df %>% group_by_at(colname) %>% tally

If there were a count_at convenience function, the natural analogous thing to do would be:

df %>% count_at(colname)

But dplyr doesn't have a count_at function, so that doesn't work.

group_by_at can also work with a mixture of strings and names when used with vars, so you could do:

colname = "cyl"
mtcars %>% group_by_at(vars(colname, vs)) %>% tally

group_by_at will continue to work in dplyr 1.0, so you could create your own count_at function if you wish. This will work with strings, names, or even a mixture of the two:

count_at = function(data, ...) {
  data %>% group_by_at(vars(...)) %>% tally
}

colname="city"
df %>% count_at(colname)
  city          n
* <fct>     <int>
1 Amsterdam     3
2 London        3
3 Paris         4
mtcars %>% count_at("cyl", vs)
    cyl    vs     n
  <dbl> <dbl> <int>
1     4     0     1
2     4     1    10
3     6     0     3
4     6     1     4
5     8     0    14
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Great pointer on the upcoming `across` functionality, very interesting! I guess I'll stick to `!!as.name(colname)` for now, as it seems the shortest, and without additional dependencies. But will be switching to `dplyr 1.0` when it's out. Either way, your answer is great going forward! – rvrvrv May 11 '20 at 22:26
  • I actually often use `group_by_at()` when I write functions using dplyr, as it avoids the need for non-standard evaluation. `across` provides the same benefit, but is more flexible, so I've been switching over to it more recently. – eipi10 May 11 '20 at 23:57
5

!! is not enough to unquote variable names for dplyr, you also need rlang::sym

df %>% count( !!rlang::sym(colname)) 
# A tibble: 3 x 2
  city          n
  <fct>     <int>
1 Amsterdam     2
2 London        7
3 Paris         1

You can have a look at a blog post I wrote on the subject if you want more elements on SE vs NSE in dplyr

linog
  • 5,786
  • 3
  • 14
  • 28