6

I can obtain a summary of a variable stratified by other variables as follows:

require(data.table)

DT <- data.table(mtcars)

var_work <- "hp"
by_vars <- c("cyl", "carb")

ans_1 <- cube(DT, j = as.list(quantile(get(var_work))), by = by_vars)

ans_1
    cyl carb  0%    25%   50%    75% 100%
 1:   6    4 110 110.00 116.5 123.00  123
 2:   4    1  65  66.00  66.0  93.00   97
 3:   6    1 105 106.25 107.5 108.75  110
 4:   8    2 150 150.00 162.5 175.00  175
 5:   8    4 205 218.75 237.5 245.00  264
 6:   4    2  52  69.25  93.0 105.50  113
 7:   8    3 180 180.00 180.0 180.00  180
 8:   6    6 175 175.00 175.0 175.00  175
 9:   8    8 335 335.00 335.0 335.00  335
10:   6   NA 105 110.00 110.0 123.00  175
11:   4   NA  52  65.50  91.0  96.00  113
12:   8   NA 150 176.25 192.5 241.25  335
13:  NA    4 110 123.00 210.0 241.25  264
14:  NA    1  65  66.00  93.0 101.00  110
15:  NA    2  52  92.00 111.0 150.00  175
16:  NA    3 180 180.00 180.0 180.00  180
17:  NA    6 175 175.00 175.0 175.00  175
18:  NA    8 335 335.00 335.0 335.00  335
19:  NA   NA  52  96.50 123.0 180.00  335

Next, I would like to write a helper function implementing exactly what is shown above, which however produces the error:

my_fun <- function(table_work, var_w, by_v) {

    tab_out <- cube(table_work, j = as.list(quantile(get(var_w))), by = by_v)
    return(tab_out)

}

ans_2 <- my_fun(table_work = DT, var_w = var_work, by_v = by_vars)

Error in get(var_w) : object 'var_w' not found

I have searched for the answer some relevant blogs (e.g., Advanced tips and tricks with data.table) and posts (e.g., by Henrik, frankc etc.), and tried different combinations of quote(), eval(), get(), assign() etc. within "my_fun", but nothing worked for me.

The question is: How should I correct "my_fun" helper function so it works and produces the same result as ans_1?

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Matrix_1.2-18     fst_0.9.0         data.table_1.12.8

loaded via a namespace (and not attached):
[1] compiler_3.6.1  parallel_3.6.1  Rcpp_1.0.3      grid_3.6.1      lattice_0.20-38

3 Answers3

4

When reading through the code for data.table:::cube.data.table and data.table:::groupingsets.data.table, the j argument is already being evaluated using NSE. Hence, being unable to pass in as.name(var_work) to the environment argument of substitute, the function will fail.

As a workaround, you can use .SDcols:

library(data.table)    
DT <- data.table(mtcars)    
var_work <- "hp"
by_vars <- c("cyl", "carb")

my_fun <- function(table_work, var_w, by_v) {
    cube(table_work, j=as.list(quantile(.SD[[1L]])), by=by_v, .SDcols=var_w)
}

ans_2 <- my_fun(table_work = DT, var_w = var_work, by_v = by_vars)
chinsoon12
  • 25,005
  • 4
  • 25
  • 35
  • @jangorecki is the best person to answer this question imo. if he comes along, i will be glad to delete this answer – chinsoon12 Jan 07 '20 at 06:06
  • it works, thanks. Still, why cube(), and I guess other functions, cannot see 'var_w' even though it is explicitly declared within "my_fun"? – Max Moldovan Jan 07 '20 at 07:07
  • `var_w` is visible in the scope of `my_fun`. however, it is passed as `jj = substitute(j)` into `cube.data.table` and `groupingsets.data.table`, and `jj` is `eval` in the scope of the `groupingsets.data.table` which does not have `var_w`. – chinsoon12 Jan 07 '20 at 07:35
  • 1
    Just came across this Q, pinging my username didn't work AFAICT. This is good workaround. Of course we are open for suggestions or improvement if needed. – jangorecki May 06 '21 at 16:32
3

You can always compute on the language:

my_fun <- function(table_work, var_w, by_v) {

  var_w <- as.name(var_w)
  eval(bquote(cube(table_work, j = as.list(quantile(.(var_w))), by = by_v)))  

}
Roland
  • 127,288
  • 10
  • 191
  • 288
-1

Here is a tidyverse solution using https://tbradley1013.github.io/2018/10/01/calculating-quantiles-for-groups-with-dplyr-summarize-and-purrr-partial/

library(tidyverse)

p <- c(0, 0.25,.5,.75,1)

p_names <- map_chr(p, ~str_c(.x*100, "%"))

p_funs <- map(p, ~partial(quantile, probs = ., na.rm = TRUE)) %>% 
  set_names(nm = p_names)


my_pipe <- function(df,group_by_what,quantile_at){
  df %>% 
    group_by(!!!group_by_what) %>% 
    summarize_at(quantile_at,p_funs)
}

var_work <- vars(hp)

by_vars <- vars(cyl,carb)

my_pipe(mtcars,group_by_what = by_vars,quantile_at = var_work)
#> # A tibble: 9 x 7
#> # Groups:   cyl [3]
#>     cyl  carb  `0%` `25%` `50%` `75%` `100%`
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
#> 1     4     1    65  66     66    93      97
#> 2     4     2    52  69.2   93   106.    113
#> 3     6     1   105 106.   108.  109.    110
#> 4     6     4   110 110    116.  123     123
#> 5     6     6   175 175    175   175     175
#> 6     8     2   150 150    162.  175     175
#> 7     8     3   180 180    180   180     180
#> 8     8     4   205 219.   238.  245     264
#> 9     8     8   335 335    335   335     335

# Easy to extend

var_work <- vars(hp,wt)

my_pipe(mtcars,group_by_what = by_vars,quantile_at = var_work)
#> # A tibble: 9 x 12
#> # Groups:   cyl [3]
#>     cyl  carb `hp_0%` `wt_0%` `hp_25%` `wt_25%` `hp_50%` `wt_50%` `hp_75%`
#>   <dbl> <dbl>   <dbl>   <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#> 1     4     1      65    1.84     66       1.94      66      2.2       93 
#> 2     4     2      52    1.51     69.2     1.75      93      2.46     106.
#> 3     6     1     105    3.22    106.      3.28     108.     3.34     109.
#> 4     6     4     110    2.62    110       2.81     116.     3.16     123 
#> 5     6     6     175    2.77    175       2.77     175      2.77     175 
#> 6     8     2     150    3.44    150       3.44     162.     3.48     175 
#> 7     8     3     180    3.73    180       3.76     180      3.78     180 
#> 8     8     4     205    3.17    219.      3.64     238.     4.54     245 
#> 9     8     8     335    3.57    335       3.57     335      3.57     335 
#> # … with 3 more variables: `wt_75%` <dbl>, `hp_100%` <dbl>, `wt_100%` <dbl>

Created on 2020-01-07 by the reprex package (v0.3.0)

Bruno
  • 4,109
  • 1
  • 9
  • 27