I have created some functions that need to handle either a disk.frame
or a data.table
as input. I am getting errors from the future
package used within disk.frame
due to an object not being found upon execution. I think this is due to the fact that future
is looking for objects to pass to each worker within the global environment and not recognizing the objects I have generated in the function's execution environment. Super assignment <<-
solves this issue, but I am wondering if there is a better - or more appropriate - way to implement the use of disk.frame
's within functions?
I am using the most recent versions of disk.frame '0.3.5'
and future '1.17.0'
with R version 4.0.0 on Windows 10 x64.
I have reproduced an example using the iris data set:
Setup
#Load data
data("iris")
head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1: 5.1 3.5 1.4 0.2 setosa
# 2: 4.9 3.0 1.4 0.2 setosa
# 3: 4.7 3.2 1.3 0.2 setosa
# 4: 4.6 3.1 1.5 0.2 setosa
# 5: 5.0 3.6 1.4 0.2 setosa
# 6: 5.4 3.9 1.7 0.4 setosa
#Setup disk.frame
library(disk.frame)
disk.frame::setup_disk.frame()
options(future.globals.maxSize = Inf)
#Make the disk.frame
df <- disk.frame::as.disk.frame(df = iris)
Working disk.frame operation
This works because filterVals
is in the global environment.
#data.table style operations - row-wise filter with vector
valMin <- 1.4
valMax <- 3.5
filterVals <- c(valMin, valMax)
#data.table style filter with disk.frame
means_filter <- df[Petal.Length %between% filterVals, ]
Perform the disk.frame operation within a function
#data.table style operations on the disk.frame in a function
f <- function(vMin, vMax, dskF){
fVals <- c(vMin, vMax)
dskF[Petal.Length %between% fVals, ]
}
#This will throw an error
means_filter_func <- f(vMin = valMin, vMax = valMax, dskF = df)
# Error in .checkTypos(e, names_x) :
# Object 'fVals' not found amongst Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species
#Same function but with supper assignment
f2 <- function(vMin, vMax, dskF){
fVals <<- c(vMin, vMax)
dskF[Petal.Length %between% fVals, ]
}
#This works
means_filter_func <- f2(vMin = valMin, vMax = valMax, dskF = df)
#Cleanup
disk.frame::delete(df)