Accessing New Variables in a dplyr function

Question

I am writing a function to calculate odds ratios for a table of counts that requires NSE evaluation with dplyr and tidyr. As may be apparent, this is my first venture into the NSE world.

For example, with a dataframe 'foo':

# A tibble: 4 x 3
   strata   group     select
    <chr>   <chr>      <chr>
1 Manager A_Group     Chosen
2  Worker A_Group     Chosen
3 Manager B_Group Not_Chosen
4  Worker B_Group     Chosen
5 ...

I first do counts: foo2 <- foo %>% count(strata, group, select)

# A tibble: 8 x 4
   strata   group     select     n
    <chr>   <chr>      <chr> <int>
1 Manager A_Group     Chosen     1
2 Manager A_Group Not_Chosen     9
3 Manager B_Group     Chosen     1
4 Manager B_Group Not_Chosen     3
5 ...

Next, I collapse into wide format using tidyr's unite and spread which names the new columns by the values of the group and select columns:

foo2 %>% unite(cat, c(group, select)) %>% 
    spread(cat, n, fill = 0)

# A tibble: 2 x 5
strata A_Group_Chosen A_Group_Not_Chosen B_Group_Chosen B_Group_Not_Chosen
*   <chr>          <dbl>              <dbl>          <dbl>              <dbl>
1 Manager              1                  9              1                  3
2  Worker              1                 11              1                  3

And last, I calculate a new column, OR as

 ... %>% mutate(OR = (A_Group_Chosen * B_Group_Not_Chosen) /
  (A_Group_Not_Chosen * B_Group_Chosen))

To put this code in a function, I handle the original columns with enquo and !!, but to calculate the new column, OR, I need the newly created columns (named by the concatenation of the values of the group and select columns). The question is how to 'unquote' the names for OR calculation?

My current draft saves the intermediate result after the unite/spread, put the names into a vector, and use the $`!!'() operator. This feels pretty kludgy. A better way?

My function:

OR_tab <- function(dat, strat, grp, decision ){
  strat <- enquo(strat)
  grp <- enquo(grp)
  decision <- enquo(decision)



  tab <- dat %>% count(!!strat, !!grp, !!decision) %>% unite(cat, c(!!grp, !!decision)) %>% 
    spread(cat, n, fill = 0)
  nm <- names(tab)[2:5]
  tab %>% mutate(OR = (tab$`!!`(nm[1]) * tab$`!!`(nm[4])) / (tab$`!!`(nm[2]) * (tab$`!!`(nm[3])))) %>% 
    print(n = Inf)
}

OR_tab(foo, strata, group, select)

My original dataframe, 'foo' :

> dput(foo2)
structure(list(strata = c("Manager", "Worker", "Manager", "Manager", 
"Worker", "Manager", "Manager", "Manager", "Worker", "Worker", 
"Worker", "Worker", "Worker", "Worker", "Manager", "Worker", 
"Worker", "Manager", "Manager", "Manager", "Worker", "Worker", 
"Manager", "Manager", "Manager", "Manager", "Worker", "Worker", 
"Worker", "Worker"), group = c("A_Group", "A_Group", "A_Group", 
"A_Group", "B_Group", "A_Group", "B_Group", "A_Group", "A_Group", 
"A_Group", "A_Group", "A_Group", "B_Group", "B_Group", "A_Group", 
"A_Group", "A_Group", "A_Group", "A_Group", "B_Group", "A_Group", 
"A_Group", "B_Group", "B_Group", "A_Group", "A_Group", "B_Group", 
"A_Group", "A_Group", "A_Group"), select = c("Chosen", "Chosen", 
"Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen", 
"Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen", 
"Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen", 
"Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen", 
"Not_Chosen", "Chosen", "Not_Chosen", "Not_Chosen", "Chosen", 
"Not_Chosen", "Not_Chosen", "Not_Chosen")), .Names = c("strata", 
"group", "select"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-30L))

This would be much easier if you did the calculation before you did the spread. It would be easier to help if you shared your data in a more [reproducible format](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so it can be copy/pasted into R for testing. Also you should provide the desired output for the sample input so possible solutions can be verified. — MrFlick, Nov 01 '17 at 20:25
I assume you write the function in order to re-use it for other `strata`, `group`, and `select` columns? Since the new column names depends on the combinations of the values of the `group` and `select` columns, it would help to let us know what the possible values of other `group` and `select` columns are. Does the `select` column always have "Chosen" and "Not_Chosen"? Does the `group` columns always have only two groups? — Paul, Nov 11 '17 at 06:31
@Paul: The function's concept requires that the group and select columns have only two values, but they can be any two. There can be multiple values for the strata column. — Jeff Bannon, Nov 13 '17 at 15:24

score 0 · Answer 1 · answered Apr 01 '18 at 14:34

You can avoid the need for unquoting after spread by doing the odds-ratio calculation on the long data (as suggested by @MrFlick):

library(tidyverse)

OR_tab2 <- function(dat, strat, grp, decision ){

  strat <- enquo(strat)
  grp <- enquo(grp)
  decision <- enquo(decision)

  dat %>% 
    count(!!strat, !!grp, !!decision) %>% 
    group_by(!!strat) %>% 
    mutate(OR = (n[1]*n[4])/(n[2]*n[3])) %>% 
    unite(cat, c(!!grp, !!decision)) %>% 
    spread(cat, n)

}

OR_tab2(foo2, strata, group, select)

  strata     OR A_Group_Chosen A_Group_Not_Chosen B_Group_Chosen B_Group_Not_Chosen
  <chr>   <dbl>          <int>              <int>          <int>              <int>
1 Manager 0.333              1                  9              1                  3
2 Worker  0.273              1                 11              1                  3

As with your original code, this will work for any data frame where the group and select arguments each have only two levels, but which pairs of levels are in the numerator or denominator will depend on the ordering of the levels in each column. For example, note that for the recoded data frame df2 below, the odds ratios are inverted relative to the those of the original data frame df.

df2 = foo2 %>% 
  mutate(experimental_groups = recode(group, 
                        "A_Group"="Control",
                        "B_Group"="Treatment"),
         flavor = recode(select, 
                         "Chosen"="Vanilla",
                         "Not_Chosen"="Chocolate"))

OR_tab2(df2, strata, experimental_groups, flavor)

  strata     OR Control_Chocolate Control_Vanilla Treatment_Chocolate Treatment_Vanilla
  <chr>   <dbl>             <int>           <int>               <int>             <int>
1 Manager  3.00                 9               1                   3                 1
2 Worker   3.67                11               1                   3                 1

OR_tab(df2, strata, experimental_groups, flavor)

  strata  Control_Chocolate Control_Vanilla Treatment_Chocolate Treatment_Vanilla    OR
  <chr>               <dbl>           <dbl>               <dbl>             <dbl> <dbl>
1 Manager                9.              1.                  3.                1.  3.00
2 Worker                11.              1.                  3.                1.  3.67

Accessing New Variables in a dplyr function

1 Answers1