4

I am trying to search a database and then label the ouput with a name derived from the original search, "derived_name" in the reproducible example below. I am using a dplyr pipe %>%, and I am having trouble with quasiquotation and/or non-standard evaluation. Specifically, using count_colname, a character object derived from "derived_name", in the final top_n() function fails to subset the dataframe.

search_name <- "derived_name"
set.seed(1)
letrs <- letters[rnorm(52, 13.5, 5)]
letrs_count.df <- letrs %>%
    table() %>%
    as.data.frame()
count_colname <- paste0(search_name, "_letr_count")
colnames(letrs_count.df) <- c("letr", count_colname)
letrs_top.df <- letrs_count.df %>%
    top_n(5, count_colname)
identical(letrs_top.df, letrs_count.df)
# [1] TRUE

Based on this discussion I thought the code above would work. And this post lead me to try top_n_(), which does not seem to exist.

I am studying vignette("programming") which is a little over my head. This post led me to try the !! sym() syntax, which works, but I have no idea why! Help understanding why the below code works would be much appreciated. Thanks.

colnames(letrs_count.df) <- c("letr", count_colname)
letrs_top.df <- letrs_count.df %>%
    top_n(5, (!! sym(count_colname)))
letrs_top.df
#   letr derived_name_letr_count
# 1    l                       5
# 2    m                       6
# 3    o                       7
# 4    p                       5
# 5    q                       6

Additional confusing examples based on @lionel and @Tung's questions and comments below. What is confusing me here is that the help fils say that sym() "take strings as input and turn them into symbols" and !! "unquotes its argument". However, in the examples below, sym(count_colname) appears to unquote to derived_name_letr_count. I do not understand why the !! is needed in !! sym(count_colname), since sym(count_colname) and qq_show(!! sym(count_colname)) give the same value.

count_colname
# [1] "derived_name_letr_count"
sym(count_colname)
# derived_name_letr_count
qq_show(count_colname)
# count_colname
qq_show(sym(count_colname))
# sym(count_colname)
qq_show(!! sym(count_colname))
# derived_name_letr_count
qq_show(!! count_colname)
# "derived_name_letr_count"
Tung
  • 26,371
  • 7
  • 91
  • 115
Josh
  • 1,210
  • 12
  • 30
  • [`dplyr` automatically quotes its inputs](https://dplyr.tidyverse.org/articles/programming.html). Here is the source code of [`top_n`](https://github.com/tidyverse/dplyr/blob/master/R/top-n.R): it uses `enquo` & `!!` to quote and unquote the inputs as well. Run `qq_show(!!quo(sym(count_colname)))` to see why you need to unquote `sym(count_colname)` with `!!` first before supplying to `top_n` – Tung Sep 12 '18 at 15:50

2 Answers2

6

According to top_n documentation (?top_n), it doesn't support character/string input thus the 1st example didn't work. In your 2nd example, rlang::sym converted the string to a variable name then !! unquoted it so that it could be evaluated inside top_n. Note: top_n and other dplyr verbs automatically quote their inputs.

Using rlang::qq_show as suggested by @lionel, we can see it doesn't work because there is no count_colname column in letrs_count.df

library(tidyverse)

set.seed(1)
letrs <- letters[rnorm(52, 13.5, 5)]
letrs_count.df <- letrs %>%
  table() %>%
  as.data.frame()

search_name <- "derived_name"
count_colname <- paste0(search_name, "_letr_count")
colnames(letrs_count.df) <- c("letr", count_colname)
letrs_count.df
#>    letr derived_name_letr_count
#> 1     b                       1
#> 2     c                       1
#> 3     f                       2
...

rlang::qq_show(top_n(letrs_count.df, 5, count_colname))
#> top_n(letrs_count.df, 5, count_colname)

sym & !! create the right column name existing in letrs_count.df

rlang::qq_show(top_n(letrs_count.df, 5, !! sym(count_colname)))
#> top_n(letrs_count.df, 5, derived_name_letr_count)

letrs_count.df %>%
  top_n(5, !! sym(count_colname))
#>   letr derived_name_letr_count
#> 1    l                       5
#> 2    m                       6
#> 3    o                       7
#> 4    p                       5
#> 5    q                       6

top_n(x, n, wt)

Arguments:

  • x: a tbl() to filter

  • n: number of rows to return. If x is grouped, this is the number of rows per group. Will include more than n rows if there are ties. If n is positive, selects the top n rows. If negative, selects the bottom n rows.

  • wt: (Optional). The variable to use for ordering. If not specified, defaults to the last variable in the tbl. This argument is automatically quoted and later evaluated in the context of the data frame. It supports unquoting. See vignette("programming") for an introduction to these concepts.

See also these answers: 1st, 2nd, 3rd

Tung
  • 26,371
  • 7
  • 91
  • 115
  • thank you. The [3rd post](https://stackoverflow.com/questions/49700912/why-is-enquo-preferable-to-substitute-eval/49702437#49702437) you referenced made me think this would work `x <- enquo(count_coln); laut.top.df <- laut.count.df %>% top_n(max(10, ceiling(percent * nrow(.) / 100)), !! x)` but it does not. This is definitely the most frustrating concept I've encountered. – Josh Aug 08 '18 at 13:47
  • 2
    @Josh: what is `count_coln`? `enquo` is usually used inside a function. I suggest you watch [Hadley's 5min tidy evaluation video](https://www.youtube.com/watch?v=nERXS3ssntw) – Tung Aug 08 '18 at 14:05
  • thanks, I wondered if that was the issue with `enquo`. Still trying to wrap my head around what `sym` does. `count_colname` represents what I want the column name to be. It consists of a unique identifier derived from my original database search (`"derived_name"` in the example) and `"_letr_count"` I make it here `count_colname <- paste0(search_name, "_letr_count")` and use it here `colnames(letrs_count.df) <- c("letr", count_colname)` and in the `top_n` function that is causing my struggle. – Josh Aug 08 '18 at 19:32
  • 2
    `sym()` transforms a string into a variable name. I suggest you use `rlang::qq_show()` to experiment with unquoting and see the results. For instance try `var <- "cyl"; rlang::qq_show(mutate(data, !!var + 1))`. Then try with `!!sym(var)` – Lionel Henry Aug 08 '18 at 19:46
  • 1
    @Josh: This link might help ease your pain going into tidyeval https://colinfay.me/tidyeval-1/ – Tung Aug 08 '18 at 20:16
0

So, I've realized that what I was struggling with in this question (and many other probelms) is not really quasiquotation and/or non-standard evaluation, but rather converting character strings into object names. Here is my new solution:

letrs_top.df <- letrs_count.df %>%
    top_n(5, get(count_colname))
Josh
  • 1,210
  • 12
  • 30