4

I have a data.table object on which I'd like to do a simple lookup:

print(class(dt))
print(colnames(dt))
print(dt[region == "UK", ])

In my interactive R session, this chunk of code does exactly what it should.

[1] "data.table" "data.frame"
[1] "region"            "site"              "visit"            
[4] "connectionfailure" "dnserror"          "http404"          
# ... output ...

In a non-interactive scripted session, I get a confusing error:

[1] "data.table" "data.frame"
[1] "region"            "site"              "visit"            
[4] "connectionfailure" "dnserror"          "http404"          
Error in `[.data.frame`(x, i, j) : object 'region' not found

It looks like R is dispatching dt[.... to [.data.frame rather than to [.data.table. Any thoughts as to why?

Dhskjlkakdh
  • 11,341
  • 5
  • 22
  • 17
  • 2
    Most likely you don't have `library(data.table)` set up in your batch execution. Could be something based on your user profile auto-loading `data.table`, but not batch exec. – BrodieG Jan 23 '14 at 21:52
  • @BrodieG, submit as answer? – Ricardo Saporta Jan 23 '14 at 21:55
  • BrodieG, to be clear: that would explain `"data.table"` showing as a class for `dt`, but the dispatch not working? – Dhskjlkakdh Jan 23 '14 at 21:57
  • @RicardoSaporta, with the extra work now I don't feel bad posting as an answer ;). sjbach, hopefully the answer addresses your question. – BrodieG Jan 23 '14 at 22:03
  • It's probably because the methods package isn't loaded automatically when Rscript starts. – Joshua Ulrich Jan 23 '14 at 22:05
  • try running `print(data.table:::cedta())` in your code (this is the command `data.table` runs internally to check if it should dispatch to `data.frame`) - might help ruling out a few things – eddi Jan 24 '14 at 00:12

2 Answers2

5

Most likely you don't have library(data.table) set up in your batch execution. Could be something based on your user profile auto-loading data.table, but not batch exec. Also, just b/c something has a class data.table, doesn't mean the package is loaded:

library(data.table)
dt <- data.table(a=1:3)
detach("package:data.table", unload=TRUE)
class(dt)
# [1] "data.table" "data.frame"
setkey(dt, a)
# Error: could not find function "setkey"
library(data.table)
setkey(dt, a)
#works
BrodieG
  • 51,669
  • 9
  • 93
  • 146
  • This is a helpful answer, but unfortunately it doesn't lead to a solution in my case. `setkey` and other `data.table` functions are defined and can be called without issue. I don't know why, but the problem appears to be isolated to `[`. – Dhskjlkakdh Jan 23 '14 at 23:37
  • 1
    @sjbach Do you see the `data.table` method when you do `methods("[")`? Are you definitely loading data table explicitly? – BrodieG Jan 23 '14 at 23:42
  • An excellent follow-up question. Alas, it is defined during the batch execution: `... [5] [.data.table* ...` (printed directly prior to the indexing that throws the error) – Dhskjlkakdh Jan 23 '14 at 23:51
  • Ah. With `options(datatable.verbose = TRUE)`, I get this warning during the indexing call: `cedta decided '' wasn't data.table aware`. I don't know how to resolve that, but at least I have a lead. – Dhskjlkakdh Jan 24 '14 at 00:15
  • @sjbach, cold comfort for you, but this works fine for me in batch mode. I would recommend you try eddi's suggestion, and also reduce your script to the simplest possible reproducible exmaple (i.e. one line is `library(data.table)`, the next creates the `data.table`, the next one is your subset command, and that's it. If that works, start adding complexity towards your original file and see what fails. – BrodieG Jan 24 '14 at 00:18
3

Posterity: in batch execution the problematic code is loaded from a custom package. I neglected to include import(data.table) in my package's NAMESPACE file. I could be wrong, but I think this would still have worked if data.table didn't include an explicit check that the [.data.table calling environment includes data.table in its namespace, i.e. data.table is perhaps overreaching. Still, I'm sure there must be a good reason for this check.

EDIT: More info about that explicit check here:
Using data.table package inside my own package

Community
  • 1
  • 1
Dhskjlkakdh
  • 11,341
  • 5
  • 22
  • 17
  • 1
    If your package is not *data.table-aware*, `[.data.table` will dispatch to `[.data.frame`. Look at the first few lines of `data.table:::\`[.data.table\`` – Arun Jan 24 '14 at 00:51