3

I have created some functions that need to handle either a disk.frame or a data.table as input. I am getting errors from the future package used within disk.frame due to an object not being found upon execution. I think this is due to the fact that future is looking for objects to pass to each worker within the global environment and not recognizing the objects I have generated in the function's execution environment. Super assignment <<- solves this issue, but I am wondering if there is a better - or more appropriate - way to implement the use of disk.frame's within functions?

I am using the most recent versions of disk.frame '0.3.5' and future '1.17.0' with R version 4.0.0 on Windows 10 x64.

I have reproduced an example using the iris data set:

Setup

#Load data
data("iris")
head(iris)
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1:          5.1         3.5          1.4         0.2  setosa
# 2:          4.9         3.0          1.4         0.2  setosa
# 3:          4.7         3.2          1.3         0.2  setosa
# 4:          4.6         3.1          1.5         0.2  setosa
# 5:          5.0         3.6          1.4         0.2  setosa
# 6:          5.4         3.9          1.7         0.4  setosa

#Setup disk.frame
library(disk.frame)
disk.frame::setup_disk.frame()
options(future.globals.maxSize = Inf)

#Make the disk.frame
df <- disk.frame::as.disk.frame(df = iris)

Working disk.frame operation

This works because filterVals is in the global environment.

#data.table style operations - row-wise filter with vector
valMin <- 1.4
valMax <- 3.5
filterVals <- c(valMin, valMax) 

#data.table style filter with disk.frame
means_filter <- df[Petal.Length %between% filterVals, ]

Perform the disk.frame operation within a function

#data.table style operations on the disk.frame in a function 
f <- function(vMin, vMax, dskF){
  fVals <- c(vMin, vMax)
  dskF[Petal.Length %between% fVals, ]
}

#This will throw an error
means_filter_func <- f(vMin = valMin, vMax = valMax, dskF = df)
# Error in .checkTypos(e, names_x) : 
#   Object 'fVals' not found amongst Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species

#Same function but with supper assignment 
f2 <- function(vMin, vMax, dskF){
  fVals <<- c(vMin, vMax)
  dskF[Petal.Length %between% fVals, ]
}
#This works
means_filter_func <- f2(vMin = valMin, vMax = valMax, dskF = df)

#Cleanup
disk.frame::delete(df)
Community
  • 1
  • 1
pmbrophy
  • 31
  • 2
  • That's an issue. It can only be fixed with https://github.com/xiaodaigh/disk.frame/issues/280 The problem is bad NSE code from my part. Would appreciate tips on how to make it better. I am doing the research now. I don't have a good solution expect to wait for disk.frame v0.4 – xiaodai Jul 26 '20 at 08:14

0 Answers0