2

I'm comparing the convenience of dplyr vs. data.table in working within loops and functions.

For this, I'm trying to modify the code snippets used in this post: data.table vs dplyr: can one do something well the other can't or does poorly? so that, instead of hard-coded dataset variables names ("cut" and "price" variables of "diamonds" dataset), they become dataset-agnostic, i.e. cut-n-paste ready for the use inside any function or a loop (when we don't know column names in advance and need to access them by column number).

This is the original code:

tbl = diamonds
tbl %>%
  filter(cut != "Fair") %>%
  group_by(cut) %>%
  summarize(
     AvgPrice = mean(price)
  ) 

I need to rewrite it so that I can use the same code in a loop like this one:

for(nVarGroup in 2:4) # Grouped by possible categorical values...
   for(nVarMeans in 5:10) { # ... get means of all parameters

}

I've done it for data.table as shown here: How to use data.table within functions and loops?.
I'm struggling however to do the same for dplyr.
These links were recommended to resolve the problem: dplyr: How to use group_by inside a function?, https://cran.r-project.org/web/packages/dplyr/vignettes/nse.html. However, while providing solution to group_by(strVarGroup) line below, they do not not seem to provide solution to qGroup=quote(get(strVarGroup) %in% strGroupConditions) line.

nVarGroup = 2 #"cut"
nVarMeans = 7 #"price"

strVarGroup = names(dt)[nVarGroup]
strVarMeans = names(dt)[nVarMeans] 
qAction=quote(mean(strVarMeans)) 
strGroupConditions = levels(dt[[nVarGroup]])[-1] # "Good" "Very Good" "Premium" "Ideal" 
qGroup=quote(get(strVarGroup) %in% strGroupConditions) 

### DOES NOT WORK ###
tbl %>%
   filter(eval(qGroup))    %>%
   group_by(strVarGroup) %>%
   summarize(
     AvgPrice = eval(qAction),
     ) 
### END: DOES NOT WORK ###

Any additional links or ideas to help?

Community
  • 1
  • 1
IVIM
  • 2,167
  • 1
  • 15
  • 41

0 Answers0