5

Say I have the following

library(data.table)
cars1 = setDT(copy(cars))
cars2 = setDT(copy(cars))

car_list = list(cars1, cars2)
class(car_list) <- "dd"

`[.dd` <- function(x,...) {
  code = rlang::enquos(...)
  cars1 = x[[1]]
  rlang::eval_tidy(quo(cars1[!!!code]))
}

car_list[,.N, by = speed]

so I wished to perform arbitrary operations on cars1 and cars2 by defining the [.dd function so that whatever I put into ... get executed by cars1 and cars2 using the [ data.table syntax e.g.

car_list[,.N, by = speed] should perform the following

cars1[,.N, by = speed]
cars2[,.N, by = speed]

also I want

car_list[,speed*2]

to do

cars1[,speed*2]
cars2[,speed*2]

Basically, ... in [.dd has to accept arbitrary code.

somehow I need to capture the ... so I tried to do code = rlang::enquos(...) and then rlang::eval_tidy(quo(cars1[!!!code])) doesn't work and gives error

Error in [.data.table(cars1, ~, ~.N, by = ~speed) : argument "i" is missing, with no default

xiaodai
  • 14,889
  • 18
  • 76
  • 140
  • I don't use rlang but the tildes in the error message shouldn't be there. data.table subsetting doesn't expect formulas. – Roland Jul 20 '19 at 08:51
  • @Roland I don't need to use rlang if I don't have to, but i just don't know how to achieve what I want. Hence the question – xiaodai Jul 20 '19 at 08:52
  • I'd just use base eval after substituting the expression into the subset expression ( with base functionality). – Roland Jul 20 '19 at 08:57
  • @Roland I would accept the answer if you just post an example as the answer. Thanks – xiaodai Jul 20 '19 at 08:58
  • Maybe later this weekend if someone else doesn't do it. No R on my phone. – Roland Jul 20 '19 at 08:59
  • 1
    Try using `rlang::enexprs` instead of `enquos`, and remove the call to `quo` inside `eval_tidy`, it's not needed. – Alexis Jul 20 '19 at 09:22
  • 2
    Just to clarify, you *can’t* use tidyeval here, because tidyeval needs to be supported by the callee, and the data.table subsetting operator doesn’t support tidyeval. As a consequence, this is unfortunately a lot more complicated to achieve. – Konrad Rudolph Jul 20 '19 at 12:16
  • related qn: https://stackoverflow.com/questions/9705488/using-data-table-i-and-j-arguments-in-functions – chinsoon12 Jul 22 '19 at 03:31
  • Is there a reason why you would like to do this? Just curious – marbel Jul 22 '19 at 15:22
  • @marble it's for my package disk.frame https://github.com/xiaodaigh/disk.frame – xiaodai Jul 23 '19 at 07:09

3 Answers3

5

While not under rlang type of mantra, this approach seems to work pretty well: lapply(dt_list, '[', ...) The code would be more readable to me as it is explicit about what method is being used. If I saw car_list[, .N, by = speed] I would expect the default data.table methods.

Making it as a function allows you to have the best of both worlds:

class(car_list) <- "dd"

`[.dd` <- function(x,...) {
 lapply(x, '[', ...)
}

car_list[, .N, speed]
car_list[, speed * 2]
car_list[, .(.N, max(dist)), speed]
car_list[, `:=` (more_speed = speed+5)]

Here are some examples of the approach:

car_list[, .N, speed]
# lapply(car_list, '[', j = .N, by = speed)
# or
# lapply(car_list, '[', , .N, speed)
[[1]]
    speed N
 1:     4 2
 2:     7 2
 3:     8 1
 4:     9 1
 5:    10 3
...
[[2]]
    speed N
 1:     4 2
 2:     7 2
 3:     8 1
 4:     9 1
 5:    10 3
...
car_list[, speed * 2]
# lapply(car_list, '[', j = speed*2)
# or
# lapply(car_list, '[', , speed*2)
[[1]]
 [1]  8  8 14 14 16 18 20 20 20 22 22 24 24 24 24 26 26
[18] 26 26 28 28 28 28 30 30 30 32 32 34 34 34 36 36 36
[35] 36 38 38 38 40 40 40 40 40 44 46 48 48 48 48 50

[[2]]
 [1]  8  8 14 14 16 18 20 20 20 22 22 24 24 24 24 26 26
[18] 26 26 28 28 28 28 30 30 30 32 32 34 34 34 36 36 36
[35] 36 38 38 38 40 40 40 40 40 44 46 48 48 48 48 50

car_list[, .(.N, max(dist)), speed]
# lapply(car_list, '[', j = list(.N, max(dist)), by = speed)
# or 
# lapply(car_list, '[', ,.(.N, max(dist)), speed)

[[1]]
    speed N  V2
 1:     4 2  10
 2:     7 2  22
 3:     8 1  16
 4:     9 1  10
 5:    10 3  34
...

[[2]]
    speed N  V2
 1:     4 2  10
 2:     7 2  22
 3:     8 1  16
 4:     9 1  10
 5:    10 3  34
...

This works with the := operator:

car_list[, `:=` (more_speed = speed+5)]
# or
# lapply(car_list, '[', , `:=` (more_speed = speed+5))

car_list
[[1]]
    speed dist more_speed
 1:     4    2          9
 2:     4   10          9
 3:     7    4         12
 4:     7   22         12
 5:     8   16         13
...

[[2]]
    speed dist more_speed
 1:     4    2          9
 2:     4   10          9
 3:     7    4         12
 4:     7   22         12
 5:     8   16         13
Cole
  • 11,130
  • 1
  • 9
  • 24
  • It's sort of missing the point. It's not for me, it's for the end user, so I need it to be callable like this dt_list[...]. – xiaodai Jul 23 '19 at 07:12
  • See edit which is more or less ```[.dd <- function(x, ...) { lapply(x, '[', ...)}``` works as well. Although I can't comment the code in this box that well. P.S. I agree, I thought I would get downvoted originally for not really answering the question :) – Cole Jul 24 '19 at 02:03
4

First base R option is substitute(...()) followed by do.call:

library(data.table)
cars1 = setDT(copy(cars))
cars2 = setDT(copy(cars))
cars2[, speed := sort(speed, decreasing = TRUE)]

car_list = list(cars1, cars2)
class(car_list) <- "dd"

`[.dd` <- function(x,...) {
  a <- substitute(...()) #this is an alist
  expr <- quote(x[[i]])
  expr <- c(expr, a)
  res <- list()
  for (i in seq_along(x)) {
    res[[i]] <- do.call(data.table:::`[.data.table`, expr)
  }
  res
}

all.equal(
  car_list[,.N, by = speed],
  list(cars1[,.N, by = speed], cars2[,.N, by = speed])
)
#[1] TRUE

all.equal(
  car_list[, speed*2],
  list(cars1[, speed*2], cars2[, speed*2])
)
#[1] TRUE

Second base R option is match.call, modify the call and then evaluate (you find this approach in lm):

`[.dd` <- function(x,...) {
  thecall <- match.call()
  thecall[[1]] <- quote(`[`)
  thecall[[2]] <- quote(x[[i]])
  res <- list()
  for (i in seq_along(x)) {
    res[[i]] <- eval(thecall)
  }
  res
}

all.equal(
  car_list[,.N, by = speed],
  list(cars1[,.N, by = speed], cars2[,.N, by = speed])
)
#[1] TRUE

all.equal(
  car_list[, speed*2],
  list(cars1[, speed*2], cars2[, speed*2])
)
#[1] TRUE

I haven't tested if these approaches will make a deep copy if you use :=.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • may i ask what is `...()`.i looked thr R internals and R lang defn and found only `...` dot-dot-dot arguments – chinsoon12 Jul 22 '19 at 02:30
  • 1
    I learned this on the R-devel mailing list. You can use `substitute` to create a call like this `foo <- function(x) {substitute(x())}; foo(bar)`. Now, instead of `x` we substitute `...`. I don't know exactly why we then don't get a call object , but a call is a pairlist internally and that's what we get. If you want something easier to understand, use what Hadley suggests in his book: `eval(substitute(alist(...)))` – Roland Jul 22 '19 at 07:01
  • 1
    thanks. found by trial and error that `quote(...)` works as well. will read up more on nse – chinsoon12 Jul 22 '19 at 08:02
  • can handle car_list[, abc := speed*3] where as @Alexis's can't. Hence the solution! – xiaodai Jul 23 '19 at 07:20
  • Is there a way to no use :::? As CRAN won't accept it. Actually, changing to `[` also works! – xiaodai Jul 23 '19 at 07:25
  • The first approach also works with just `\`[\`` (as does the second approach). – Roland Jul 23 '19 at 07:30
3

The suggestion in my comment wasn't complete. You can indeed use rlang to support tidy evaluation, but since data.table itself doesn't support it directly, you're better off using expressions instead of quosures, and you need to build the complete final expression before calling eval_tidy:

`[.dd` <- function(x, ...) {
  code <- rlang::enexprs(...)
  lapply(x, function(dt) {
    ex <- rlang::expr(dt[!!!code])
    rlang::eval_tidy(ex)
  })
}

car_list[, .N, by = speed]
[[1]]
    speed N
 1:     4 2
 2:     7 2
 3:     8 1
 4:     9 1
 5:    10 3
 6:    11 2
 7:    12 4
 8:    13 4
 9:    14 4
10:    15 3
11:    16 2
12:    17 3
13:    18 4
14:    19 3
15:    20 5
16:    22 1
17:    23 1
18:    24 4
19:    25 1

[[2]]
    speed N
 1:     4 2
 2:     7 2
 3:     8 1
 4:     9 1
 5:    10 3
 6:    11 2
 7:    12 4
 8:    13 4
 9:    14 4
10:    15 3
11:    16 2
12:    17 3
13:    18 4
14:    19 3
15:    20 5
16:    22 1
17:    23 1
18:    24 4
19:    25 1
Alexis
  • 4,950
  • 1
  • 18
  • 37
  • It's a great solution, but it couldn't handle `car_list[, abc := speed*3]`. – xiaodai Jul 23 '19 at 07:21
  • 1
    @xiaodai If you want to use `:=`, you'd need to pass `.unquote_names = FALSE` to `enexprs`. `rlang` also uses `:=` for its own purposes, so interaction between it and `data.table` requires special considerations. – Alexis Jul 23 '19 at 07:44