1

I encountered a really strange problem with my aggregation function of the data.table package. When I run it in a script file line by line, it works perfect. Also when I put it in a function in that script file.

BUT when I wanna build my own R package and tag the same function with @export to make it callable, then the code breaks. It also breaks when I hide that function, without the tag, in another callable function in the package.

I can give you a small example data set. But remember to test it, you have to start a new R package project and tag and build the function.

Here it is: It just builds an aggregated sum over a variable.

# Example input data set df1
require(lubridate)
days = 365*2
date = seq(as.Date("2000-01-01"), length = days, by = "day")
year = year(date)
month = month(date)
x1 = cumsum(rnorm(days, 0.05)) 
df1 = data.frame(date, year, month, x1)

# Manual approach - called line by line. Works as expected
library(data.table)
df2 <- setDT(df1)[, lapply(.SD, mean), by=.(year, month), .SDcols = "x1"]
setDF(df2)
df2

# The aggregation function in the script file. 
testAggregationInScript <- function(df) {
  library(data.table)
  df2 <- setDT(df)[, lapply(.SD, mean), by=.(year, month), .SDcols = "x1"]
  setDF(df2)
  return(df2)
}

# Call the function of the script file. Works as expected
df3.script <- testAggregationInScript(df1)


# -----------------
# In the test R package build the test aggregation function

#' If the function is in a package and built and then called, it breaks
#' 
#' @export
testAggregationInPackage <- function(df) {
  library(data.table)
  df2 <- setDT(df)[, lapply(.SD, mean), by=.(year, month), .SDcols = "x1"]
  setDF(df2)
  return(df2)
}

# -----------------

# -----------------
# Back in the R script

# Call the function from the R package in an R script
# Here the code fails due to some strange error. Although everything seems the same
library(testRpackage)
df3.package <- testAggregationInPackage(df1)

The error message in the console is very vague:

Error in .subset(x, j) : invalid subscript type 'list'
Called from: `[.data.frame`(x, i, j)

I really don't get it. It seems that the input is not the same. Maybe R changes the input format or something for package functions when the parameters are passed along. Or it is just something stupid from my side^^

I tested other aggregation functions e.g. from the dplyr package and they work as it normally should with the data.table package. But I can't switch to another approach I have to use the data.table package.

So I need your help guys. Thanks in advance and don't hesitate to ask or comment.

Frank
  • 66,179
  • 8
  • 96
  • 180
Timo Wagner
  • 406
  • 4
  • 10
  • 1
    The issue seems to be linked to [this post](http://stackoverflow.com/questions/23252231/r-data-table-breaks-in-exported-functions?rq=1) and [here](https://github.com/hadley/devtools/issues/192). I will check it out... – Timo Wagner Apr 28 '17 at 20:41

1 Answers1

1

There still seems to be an issue with the devtools package. As you can read here. What gave me a good hint was this earlier stackoverflow question.

In summary the approach is as follows:

  1. add #' @import data.table in the script file of the R package where the function lies.
  2. add import(data.table) statement to the NAMESPACE file
  3. Although I already had Imports: data.table, I additionally added Depends: data.table in the DESCRIPTION file
  4. Then I rebuilt it and reinstalled it
Community
  • 1
  • 1
Timo Wagner
  • 406
  • 4
  • 10
  • Maybe worth re-referencing the FAQ: https://rawgit.com/wiki/Rdatatable/data.table/vignettes/datatable-faq.html#i-have-created-a-package-that-depends-on-data.table.-how-do-i-ensure-my-package-is-data.table-aware-so-that-inheritance-from-data.frame-works – Frank Apr 28 '17 at 23:51