I encountered a really strange problem with my aggregation function of the data.table
package. When I run it in a script file line by line, it works perfect. Also when I put it in a function in that script file.
BUT when I wanna build my own R package and tag the same function with @export
to make it callable, then the code breaks. It also breaks when I hide that function, without the tag, in another callable function in the package.
I can give you a small example data set. But remember to test it, you have to start a new R package
project and tag and build the function.
Here it is: It just builds an aggregated sum over a variable.
# Example input data set df1
require(lubridate)
days = 365*2
date = seq(as.Date("2000-01-01"), length = days, by = "day")
year = year(date)
month = month(date)
x1 = cumsum(rnorm(days, 0.05))
df1 = data.frame(date, year, month, x1)
# Manual approach - called line by line. Works as expected
library(data.table)
df2 <- setDT(df1)[, lapply(.SD, mean), by=.(year, month), .SDcols = "x1"]
setDF(df2)
df2
# The aggregation function in the script file.
testAggregationInScript <- function(df) {
library(data.table)
df2 <- setDT(df)[, lapply(.SD, mean), by=.(year, month), .SDcols = "x1"]
setDF(df2)
return(df2)
}
# Call the function of the script file. Works as expected
df3.script <- testAggregationInScript(df1)
# -----------------
# In the test R package build the test aggregation function
#' If the function is in a package and built and then called, it breaks
#'
#' @export
testAggregationInPackage <- function(df) {
library(data.table)
df2 <- setDT(df)[, lapply(.SD, mean), by=.(year, month), .SDcols = "x1"]
setDF(df2)
return(df2)
}
# -----------------
# -----------------
# Back in the R script
# Call the function from the R package in an R script
# Here the code fails due to some strange error. Although everything seems the same
library(testRpackage)
df3.package <- testAggregationInPackage(df1)
The error message in the console is very vague:
Error in .subset(x, j) : invalid subscript type 'list'
Called from: `[.data.frame`(x, i, j)
I really don't get it. It seems that the input is not the same. Maybe R
changes the input format or something for package functions when the parameters are passed along. Or it is just something stupid from my side^^
I tested other aggregation functions e.g. from the dplyr
package and they work as it normally should with the data.table
package. But I can't switch to another approach I have to use the data.table
package.
So I need your help guys. Thanks in advance and don't hesitate to ask or comment.