10

data.table is a wonderful package, which, alas, generates unwarranted warnings from checkUsage (the code comes from here and here):

> library(compiler)
> compiler::enableJIT(3)
> dt <- data.table(a = c(rep(3, 5), rep(4, 5)), b=1:10, c=11:20, d=21:30, key="a")
> my.func <- function (dt) {
  dt.out <- dt[, lapply(.SD, sum), by = a]
  dt.out[, count := dt[, .N, by=a]$N]
  dt.out
}
> checkUsage(my.func)
<anonymous>: no visible binding for global variable ‘.SD’ (:2)
<anonymous>: no visible binding for global variable ‘a’ (:2)
<anonymous>: no visible binding for global variable ‘count’ (:3)
<anonymous>: no visible binding for global variable ‘.N’ (:3)
<anonymous>: no visible binding for global variable ‘a’ (:3)
> my.func(dt)
Note: no visible binding for global variable '.SD' 
Note: no visible binding for global variable 'a' 
Note: no visible binding for global variable 'count' 
Note: no visible binding for global variable '.N' 
Note: no visible binding for global variable 'a' 
   a  b  c   d count
1: 3 15 65 115     5
2: 4 40 90 140     5

The warnings about a can be avoided by replacing by=a with by="a", but how do I deal with the other 3 warnings?

This matters to me because these warnings clutter the screen and obscure legitimate warnings. Since the warnings are issued on my.func invocation (when JIT compiler is enabled), not just by checkUsage, I am inclined to call this a bug.

Community
  • 1
  • 1
sds
  • 58,617
  • 29
  • 161
  • 278
  • Query: those are objects inside `my.func` , so why should they be considered `global` variables? – Carl Witthoft Apr 23 '13 at 14:53
  • 3
    See [this](http://stackoverflow.com/a/15411032/967840), and [this](http://stackoverflow.com/a/8096882/967840) – GSee Apr 23 '13 at 14:57
  • 1
    I don't know `checkUsage`. If there's something I can change in `data.table` please let me know. Or maybe there's an option to `checkUsage`. – Matt Dowle Apr 23 '13 at 15:12
  • @MatthewDowle: this is not just `checkUsage`, the JIT compiler issues the warning too – sds Apr 23 '13 at 15:18
  • @sds In about 10 seconds of looking at `?enableJIT` I found `options` and `suppressUndefined`. Did you find it? Raising it as a `data.table` bug already seems a little quick. – Matt Dowle Apr 23 '13 at 15:50
  • @MatthewDowle: I do _not_ want to `suppressUndefined`. I _do_ want to see those warnings, but only the legitimate ones. – sds Apr 23 '13 at 16:02
  • @sds I replied to your reply that this note (not warning) is an old chestnut in the R community and probably beyond my ability to resolve. But I've left your `data.table` bug report open. – Matt Dowle Apr 23 '13 at 17:49
  • 1
    It seems unlikely that byte code compiling code that uses data.table will be of any benefit, since pretty much all the data.table code already uses compiled C code. – hadley Apr 24 '13 at 12:57
  • @hadley: my primary reason for compiling is the same as `checkUsage`: error detection before execution – sds Apr 24 '13 at 13:06
  • @sds Given that `checkUsage` is purely heuristic driven, and it doesn't work well with any package that uses non-standard evaluation (e.g. data.table, ggplot2, plyr, ...), I doubt that the benefits out weigh the costs. You'd be better off relying on unit testing. – hadley Apr 24 '13 at 13:53
  • 1
    Unit testing is orthogonal to this question. I can unit-test the heck out of something, and still not notice that it's doing unsavory things to the global namespace. – Ken Williams May 03 '13 at 20:18

2 Answers2

7

UPDATE : Now resolved in v1.8.11. From NEWS :

.SD,.N,.I,.GRP and .BY are now exported (as NULL). So that NOTEs aren't produced for them by R CMD check or codetools::checkUsage via compiler::enableJIT(). utils::globalVariables() was considered, but exporting chosen. Thanks to Sam Steingold for raising, #2723.

And to resolve the notes for the column name symbols count and a, they can both be wrapped with quotes (even on the LHS of :=). Using a fresh R session (since the notes were first time only) the following now produces no notes.

$ R
R version 3.0.1 (2013-05-16) -- "Good Sport"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
> require(data.table)
Loading required package: data.table
data.table 1.8.11  For help type: help("data.table")
> library(compiler)
> compiler::enableJIT(3)
[1] 0
> dt <- data.table(a=c(rep(3,5),rep(4,5)), b=1:10, c=11:20, d=21:30, key="a")
> my.func <- function (dt) {
  dt.out <- dt[, lapply(.SD, sum), by = "a"]
  dt.out[, "count" := dt[, .N, by="a"]$N]
  dt.out
}
> my.func(dt)
   a  b  c   d count
1: 3 15 65 115     5
2: 4 40 90 140     5
> checkUsage(my.func)
> 
Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
2

It appears that the only way at this time is

my.func <- function (dt) {
  .SD <- .N <- count <- a <- NULL  # avoid inappropriate warnings
  dt.out <- dt[, lapply(.SD, sum), by = a]
  dt.out[, count := dt[, .N, by=a]$N]
  dt.out
}

i.e., to bind locally the variables reported as unbound globals.

Thanks to @GSee for the links.

sds
  • 58,617
  • 29
  • 161
  • 278