The Problem
I am trying to create a function that uses dplyr
syntax and []
, but am using quosures incorrectly. The problem stems from a rocky foundation with quosures and tidyeval. I am hoping someone can explain why my function isn't working.
Background
I found this code really useful and wanted to turn it into a function with which I could vary the arguments without using strings. I was able to get the function to this point, using the Programming with dplyr Vignette. (note: I changed the original code to meet my needs)
library(dplyr)
persistence <- function(df, period, ...){
period <- enquo(period)
group_var <- quos(...)
df %>%
group_by(!!! group_var, !! period) %>%
summarise(persistence_rate = length(base::intersect(id, df$id[df$rank==(rank+1)]))/n_distinct(id))
}
Using the data I've provided below, using this function gives me my desired output:
persistence(data, period)
# A tibble: 5 x 2
period persistence_rate
<chr> <dbl>
1 a 0.500
2 b 1.00
3 c 0.667
4 d 0.667
5 e 0.
Unfortunately, when trying to vary the id and rank columns I was not sure how to incorporate the quosures.
What I've Tried
Using this data:
data <- structure(list(id = c("A", "B", "C", "D", "A", "C", "A", "B", "C", "A", "D", "C", "A", "B", "C"),
period = c("a", "a", "a", "a", "b", "b", "c", "c", "c", "d", "d", "d", "e", "e", "e"),
rank = c(1, 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5),
group = c("g1", "g2", "g1", "g2", "g1", "g1", "g1", "g2", "g1", "g1", "g2", "g1", "g1", "g2", "g1")),
.Names = c("id", "period", "rank", "group"),
row.names = c(NA, -15L),
class = c("tbl_df", "tbl", "data.frame"))
I ended up with this function:
persistence_new <- function(df, id, period, rank, ...){
period <- enquo(period)
id <- enquo(id)
rank <- enquo(rank)
group_var <- quos(...)
df %>%
group_by(UQS(group_var), UQ(period)) %>%
summarise(persistence_rate = length(base::intersect(UQ(id), UQ(id)[UQ(rank) == (UQ(rank) + 1)]))/n_distinct(UQ(id)))
}
Which gives me this result:
persistence_new(data, id, period, rank)
# A tibble: 5 x 2
period persistence_rate
<chr> <dbl>
1 a 0.
2 b 0.
3 c 0.
4 d 0.
5 e 0.
It took me a long time to get it to this point. As I was trying different things, it would often spit out an error. Now, it is running, but not giving me the results I want.
I essentially tried every iteration of ()
,UQ
, []
, and [[]]
that I could think of.
Thanks
I am hoping to learn more about tidyeval so that I don't have such a difficult time with this in the future. With that being said, and given that the problem is because of a lack of understanding, I would appreciate any perspectives on why my current function doesn't work. Any insight to make tidyeval more intuitive would be great.
Alternatively, feel free to point to me to a specific section of the Programming with dplyr Vignette. I've worked through the entire thing twice, but a specific section to focus on may be useful.
I appreciate the help. Let me know if I can provide any additional information.
SessionInfo
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2 dplyr_0.7.4
loaded via a namespace (and not attached):
[1] Rcpp_0.12.16 utf8_1.1.3 crayon_1.3.4 assertthat_0.2.0 R6_2.2.2
[6] magrittr_1.5 pillar_1.2.1 cli_1.0.0 rlang_0.2.0.9001 rstudioapi_0.7.0-9000
[11] tools_3.4.4 glue_1.2.0 yaml_2.1.19 compiler_3.4.4 pkgconfig_2.0.1
[16] bindr_0.1.1 tibble_1.4.2