I want to create a large lookup table of key value pairs, attempting it like this:
# actual use case is length ~5 million
key <- do.call(paste0, Map(stringi::stri_rand_strings, n=2e5, length = 16))
val <- sample.int(750, size = 2e5, replace = T)
make_dict <- function(keys, values){
require(rlang)
e <- new.env(size = length(keys))
l <- list2(!!!setNames(values, keys))
list2env(l, envir = e, hash = T) # problem in here...?
}
d <- make_dict(key, val)
Problem
When make_dict
is run it throws Error: protect(): protection stack overflow
. Specifically in RStudio when the input is a vector of length is greater than 49991, which seems very similar to this stackoverflow post.
However, when I run accessor functions to grab some of the values, it seems that make_dict
ran fine after all, as I can't find any oddities in its result:
`%||%` <- function(x,y) if(is.null(x)) y else x
grab <- function(...){
vector("integer", length(..2)) |>
(\(.){. = Vectorize(\(e, x) e[[x]] %||% NA_integer_, list("x"), T, F)(..1, ..2); .})()
}
out <- vector("integer", length(key))
out <- grab(d, sample(key)) # using sample to scramble the keys
anyNA(out) | !lobstr::obj_size(out) == lobstr::obj_size(val)
[1] FALSE
Running the same code in RGui does not throw the error.
Oddities
- The
d
environment object does not appear in environment pane in RStudio for size > 5e4. - The R console returns swiftly back to > (signaling the function has finished), but is unresponsive until the error is thrown
- Error is thrown if manually setting
options(expressions = 5e5)
, or retaining the default value of 5000 - When the error is thrown is proportional to the size of the input vector
tryCatch(make_dict(key, val), error = function(e) e)
doesn't catch an error- The error also occurs if code is run from package (Packaged version available through
remotes::install_github("D-Se/minimal")
)
Question
What's going on here? How to troubleshoot such an error?
options(error = traceback)
as advised here didn't give any results. Inserting a browser()
after list2env
in the make_dict
function throws an error long after the browser has opened. A traceback()
gives the function .rs.describeObject
, which is used to generate the summary in the Environment pane, and can be found here.
traceback()
# .rs.describeObject
(function (env, objName, computeSize = TRUE)
{
obj <- get(objName, env)
hasNullPtr <- .Call("rs_hasExternalPointer", obj, TRUE, PACKAGE = "(embedding)")
if (hasNullPtr) {
val <- "<Object with null pointer>"
desc <- "An R object containing a null external pointer"
size <- 0
len <- 0
}
else {
val <- "(unknown)"
desc <- ""
size <- if (computeSize)
object.size(obj)
else 0
len <- length(obj)
}
class <- .rs.getSingleClass(obj)
contents <- list()
contents_deferred <- FALSE
if (is.language(obj) || is.symbol(obj)) {
val <- deparse(obj)
}
else if (!hasNullPtr) {
if (size > 524288) {
len_desc <- if (len > 1)
paste(len, " elements, ", sep = "")
else ""
if (is.data.frame(obj)) {
val <- "NO_VALUE"
desc <- .rs.valueDescription(obj)
}
else {
val <- paste("Large ", class, " (", len_desc,
format(size, units = "auto", standard = "SI"),
")", sep = "")
}
contents_deferred <- TRUE
}
else {
val <- .rs.valueAsString(obj)
desc <- .rs.valueDescription(obj)
if (class == "data.table" || class == "ore.frame" ||
class == "cast_df" || class == "xts" || class ==
"DataFrame" || is.list(obj) || is.data.frame(obj) ||
isS4(obj)) {
if (computeSize) {
contents <- .rs.valueContents(obj)
}
else {
val <- "NO_VALUE"
contents_deferred <- TRUE
}
}
}
}
list(name = .rs.scalar(objName), type = .rs.scalar(class),
clazz = c(class(obj), typeof(obj)), is_data = .rs.scalar(is.data.frame(obj)),
value = .rs.scalar(val), description = .rs.scalar(desc),
size = .rs.scalar(size), length = .rs.scalar(len), contents = contents,
contents_deferred = .rs.scalar(contents_deferred))
})(<environment>, "d", TRUE)