1

I want to create a large lookup table of key value pairs, attempting it like this:

# actual use case is length ~5 million
key <- do.call(paste0, Map(stringi::stri_rand_strings, n=2e5, length = 16))
val <- sample.int(750, size = 2e5, replace = T)

make_dict <- function(keys, values){
  require(rlang)
  e <- new.env(size = length(keys))
  l <- list2(!!!setNames(values, keys))
  list2env(l, envir = e, hash = T) # problem in here...?
}

d <- make_dict(key, val)

Problem

When make_dict is run it throws Error: protect(): protection stack overflow. Specifically in RStudio when the input is a vector of length is greater than 49991, which seems very similar to this stackoverflow post.

However, when I run accessor functions to grab some of the values, it seems that make_dict ran fine after all, as I can't find any oddities in its result:

`%||%` <- function(x,y) if(is.null(x)) y else x
grab <- function(...){
  vector("integer", length(..2)) |>
    (\(.){. = Vectorize(\(e, x) e[[x]] %||% NA_integer_, list("x"), T, F)(..1, ..2); .})()
}
out <- vector("integer", length(key))
out <- grab(d, sample(key)) # using sample to scramble the keys

anyNA(out) | !lobstr::obj_size(out) == lobstr::obj_size(val)
[1] FALSE

Running the same code in RGui does not throw the error.

Oddities

  1. The d environment object does not appear in environment pane in RStudio for size > 5e4.
  2. The R console returns swiftly back to > (signaling the function has finished), but is unresponsive until the error is thrown
  3. Error is thrown if manually setting options(expressions = 5e5), or retaining the default value of 5000
  4. When the error is thrown is proportional to the size of the input vector
  5. tryCatch(make_dict(key, val), error = function(e) e) doesn't catch an error
  6. The error also occurs if code is run from package (Packaged version available through remotes::install_github("D-Se/minimal"))

Question

What's going on here? How to troubleshoot such an error?

options(error = traceback) as advised here didn't give any results. Inserting a browser() after list2env in the make_dict function throws an error long after the browser has opened. A traceback() gives the function .rs.describeObject, which is used to generate the summary in the Environment pane, and can be found here.

traceback()

# .rs.describeObject
(function (env, objName, computeSize = TRUE) 
   {
       obj <- get(objName, env)
       hasNullPtr <- .Call("rs_hasExternalPointer", obj, TRUE, PACKAGE = "(embedding)")
       if (hasNullPtr) {
           val <- "<Object with null pointer>"
           desc <- "An R object containing a null external pointer"
           size <- 0
           len <- 0
       }
       else {
           val <- "(unknown)"
           desc <- ""
           size <- if (computeSize) 
               object.size(obj)
           else 0
           len <- length(obj)
       }
       class <- .rs.getSingleClass(obj)
       contents <- list()
       contents_deferred <- FALSE
       if (is.language(obj) || is.symbol(obj)) {
           val <- deparse(obj)
       }
       else if (!hasNullPtr) {
           if (size > 524288) {
               len_desc <- if (len > 1) 
                   paste(len, " elements, ", sep = "")
               else ""
               if (is.data.frame(obj)) {
                   val <- "NO_VALUE"
                   desc <- .rs.valueDescription(obj)
               }
               else {
                   val <- paste("Large ", class, " (", len_desc, 
                     format(size, units = "auto", standard = "SI"), 
                     ")", sep = "")
               }
               contents_deferred <- TRUE
           }
           else {
               val <- .rs.valueAsString(obj)
               desc <- .rs.valueDescription(obj)
               if (class == "data.table" || class == "ore.frame" || 
                   class == "cast_df" || class == "xts" || class == 
                   "DataFrame" || is.list(obj) || is.data.frame(obj) || 
                   isS4(obj)) {
                   if (computeSize) {
                     contents <- .rs.valueContents(obj)
                   }
                   else {
                     val <- "NO_VALUE"
                     contents_deferred <- TRUE
                   }
               }
           }
       }
       list(name = .rs.scalar(objName), type = .rs.scalar(class), 
           clazz = c(class(obj), typeof(obj)), is_data = .rs.scalar(is.data.frame(obj)), 
           value = .rs.scalar(val), description = .rs.scalar(desc), 
           size = .rs.scalar(size), length = .rs.scalar(len), contents = contents, 
           contents_deferred = .rs.scalar(contents_deferred))
   })(<environment>, "d", TRUE)
Phil
  • 7,287
  • 3
  • 36
  • 66
Donald Seinen
  • 4,179
  • 5
  • 15
  • 40

1 Answers1

1

This github issue pointed out by @technocrat talks about a known bug in earlier versions of RStudio of disabling null external pointer checks, and has since been solved by adding an additional preference check in .rs.describeObject() of

.rs.readUiPref("check_null_external_pointers")

To check if code is run from within RStudio, and if that version is lower than that of before a certain version number (here I use the current official release), a check can be included in the function, or in the .OnAttach of a package:

if(!is.na(Sys.getenv("RSTUDIO", unset = NA)) && .rs.api.versionInfo()$version < "2021.9.1.372")){
  # warning or action
}
Donald Seinen
  • 4,179
  • 5
  • 15
  • 40