I am writing a function to calculate odds ratios for a table of counts that requires NSE evaluation with dplyr and tidyr. As may be apparent, this is my first venture into the NSE world.
For example, with a dataframe 'foo':
# A tibble: 4 x 3
strata group select
<chr> <chr> <chr>
1 Manager A_Group Chosen
2 Worker A_Group Chosen
3 Manager B_Group Not_Chosen
4 Worker B_Group Chosen
5 ...
I first do counts: foo2 <- foo %>% count(strata, group, select)
# A tibble: 8 x 4
strata group select n
<chr> <chr> <chr> <int>
1 Manager A_Group Chosen 1
2 Manager A_Group Not_Chosen 9
3 Manager B_Group Chosen 1
4 Manager B_Group Not_Chosen 3
5 ...
Next, I collapse into wide format using tidyr's unite and spread which names the new columns by the values of the group and select columns:
foo2 %>% unite(cat, c(group, select)) %>%
spread(cat, n, fill = 0)
# A tibble: 2 x 5
strata A_Group_Chosen A_Group_Not_Chosen B_Group_Chosen B_Group_Not_Chosen
* <chr> <dbl> <dbl> <dbl> <dbl>
1 Manager 1 9 1 3
2 Worker 1 11 1 3
And last, I calculate a new column, OR as
... %>% mutate(OR = (A_Group_Chosen * B_Group_Not_Chosen) /
(A_Group_Not_Chosen * B_Group_Chosen))
To put this code in a function, I handle the original columns with enquo and !!, but to calculate the new column, OR, I need the newly created columns (named by the concatenation of the values of the group and select columns). The question is how to 'unquote' the names for OR calculation?
My current draft saves the intermediate result after the unite/spread, put the names into a vector, and use the $`!!'() operator. This feels pretty kludgy. A better way?
My function:
OR_tab <- function(dat, strat, grp, decision ){
strat <- enquo(strat)
grp <- enquo(grp)
decision <- enquo(decision)
tab <- dat %>% count(!!strat, !!grp, !!decision) %>% unite(cat, c(!!grp, !!decision)) %>%
spread(cat, n, fill = 0)
nm <- names(tab)[2:5]
tab %>% mutate(OR = (tab$`!!`(nm[1]) * tab$`!!`(nm[4])) / (tab$`!!`(nm[2]) * (tab$`!!`(nm[3])))) %>%
print(n = Inf)
}
OR_tab(foo, strata, group, select)
My original dataframe, 'foo' :
> dput(foo2)
structure(list(strata = c("Manager", "Worker", "Manager", "Manager",
"Worker", "Manager", "Manager", "Manager", "Worker", "Worker",
"Worker", "Worker", "Worker", "Worker", "Manager", "Worker",
"Worker", "Manager", "Manager", "Manager", "Worker", "Worker",
"Manager", "Manager", "Manager", "Manager", "Worker", "Worker",
"Worker", "Worker"), group = c("A_Group", "A_Group", "A_Group",
"A_Group", "B_Group", "A_Group", "B_Group", "A_Group", "A_Group",
"A_Group", "A_Group", "A_Group", "B_Group", "B_Group", "A_Group",
"A_Group", "A_Group", "A_Group", "A_Group", "B_Group", "A_Group",
"A_Group", "B_Group", "B_Group", "A_Group", "A_Group", "B_Group",
"A_Group", "A_Group", "A_Group"), select = c("Chosen", "Chosen",
"Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen",
"Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen",
"Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen",
"Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen",
"Not_Chosen", "Chosen", "Not_Chosen", "Not_Chosen", "Chosen",
"Not_Chosen", "Not_Chosen", "Not_Chosen")), .Names = c("strata",
"group", "select"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-30L))