0

I'm asking because I'm trying to identify the root cause(s) for the warnings about "Invalid .internal.selfref detected and fixed" that I have triggered on five occasions while developing code using data.table. It seems pretty clear that, somewhere along the line, my code performs an operation on a data.table which causes it to be copied -- at least under some conditions. It's a long path to the warning, and my code-inspection hasn't revealed any likely suspects. Until I can localise the defect, I can't produce an MRE -- nor can I be confident of the integrity of the data.tables produced by my code.

A reasonably-efficient way to localise what I'm tentatively labelling an "inadvertent copy defect" would be to pepper-pot my code with stopifnot() invocations. But! I can't figure out how to write a valid.internal.selfref() method.

Section 5.13 of Writing R Extensions tells me that this method can't be written in R. I'm pretty sure there'll be a C-language internal method of data.table which guards the "Invalid .internal.selfref detected and fixed" warning.

Warning: 'Invalid .internal.selfref detected' when adding a column to a data.table returned from a function has a very nice explanation of .internal.selfref, but fails to reveal (at least in my reading) how a user of data.table could test this sentinel.

  • It's unlikely such a function would be exported since `.internal.selfref` is supposed to be an implementation detail. Without knowing more details, liberal use of `copy()` in situations where this warning arises can be seen as a good thing. – MichaelChirico Dec 18 '22 at 02:08
  • Indeed `copy()` can be helpful, if you're not terribly concerned about the integrity of the attributes (and the data!) in your `data.table`. However I'm developing a package that will perform stochastic experimentation, so it's really important that the experimental data be reliably preserved -- along with the values of secondary factors (which I'm storing in object-level attributes). I can read corrupted `data.tables` using `load()`, copy them using `data.table::copy()`, and `save()` them again -- that'll clear the copy-detection sentinel but won't repair any corrupted data or attribute. – Clark Thomborson Dec 18 '22 at 04:18

1 Answers1

3

Looking at data.table source code, this warning seems to be triggered here.

This means you could use the internal data.table:::selfrefok function:

library(data.table)

data.table:::selfrefok(data.table(x=1))
#> [1] 1
data.table:::selfrefok(data.frame(x=1))
#> [1] 0
Waldi
  • 39,242
  • 6
  • 30
  • 78
  • 2
    Thanks!! I'm now hot on the trail of the (until-now) elusive DT operation in my code that corrupts a `data.table`. `stopifnot(data.table:::selfrefok(dt)==1)` is my new friend ;-) – Clark Thomborson Dec 18 '22 at 03:06
  • I have isolated and repaired a couple of defects. Subclassing a DT using `class(DT) <-` had corrupted my DT, and it was also corrupted by `DT <- dplyr::bind_rows(DT,row)`. The former is sort-of-obviously hazardous after thinking through the implications of the .internal.selfref warning message. The latter is a surprising hazard, given that dplyr is "One of the core packages of the tidyverse in the R programming language". – Clark Thomborson Dec 18 '22 at 04:32
  • 1
    glad `selfrefok` helped to detect some issues ;-) . `dplyr::bind_rows` [seems to be a known one](https://github.com/Rdatatable/data.table/issues/3274) – Waldi Dec 18 '22 at 06:56