0

I have a large dataset (3667856 x 20), which gives me a warning message below:

library(data.table)
library(zoo)

data[, new_quant_PD := na.locf(QUANT_PD,na.rm=FALSE), by=c('OBLIGOR_ID','PORTFOLIO','OBLIGATION_NUMBER')]
Warning messages:
1: In `[.data.table`(data, , `:=`(new_quant_PD, na.locf(QUANT_PD,  ... :
  Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or been created manually using structure() or similar). Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<=v3.0.2, list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to copy named objects); please upgrade to R>v3.0.2 if that is biting. If this message doesn't help, please report to datatable-help so the root cause can be fixed.

In order to understand the situation better, I created the following simpler (yet similar) example:

tmp = data.table(name=c('Zhao','Zhao','Zhao','Qian','Qian','Sun','Sun','Li','Li','Li'),score=c('B+',NA,'B',NA,NA,NA,'A',NA,'A-',NA))
tmp


   name score
 1: Zhao    B+
 2: Zhao    NA
 3: Zhao     B
 4: Qian    NA
 5: Qian    NA
 6:  Sun    NA
 7:  Sun     A
 8:   Li    NA
 9:   Li    A-
10:   Li    NA


tmp[,new_score:=na.locf(score,na.rm=FALSE),by='name']
tmp


 name score new_score
 1: Zhao    B+        B+
 2: Zhao    NA        B+
 3: Zhao     B         B
 4: Qian    NA        NA
 5: Qian    NA        NA
 6:  Sun    NA        NA
 7:  Sun     A         A
 8:   Li    NA        NA
 9:   Li    A-        A-
10:   Li    NA        A-

This smaller example does not generate a warning message at all.

In theory I can loop over all combinations of OBLIGOR_ID, PORTFOLIO, and OBLIGATION_NUMBER, and find out which one(s) is (are) causing the trouble, but data is only part of a 81293658 row dataset that I have. I don't think I can afford so much loop time in R.

Any suggestion is greatly appreciated!

Ye Tian
  • 353
  • 1
  • 2
  • 17
  • 4
    How did you create the `data` data.table? The warning is clearly suggesting something might have gone wrong there. – Jaap Sep 04 '17 at 19:29

1 Answers1

3

Good question but it is not reproducible because we can't see where the object data came from. This step is critically important in helping you.

The warning message (that I wrote) is included in your question so that's good. But it appears as one single long line. Here it is again in full so we can easily see it :

Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or been created manually using structure() or similar). Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<=v3.0.2, list(DT1,DT2) copied the entire DT1 and DT2 (R's list() used to copy named objects); please upgrade to R>v3.0.2 if that is biting. If this message doesn't help, please report to datatable-help so the root cause can be fixed.

The second sentence starts "At an earlier point ...". So, were did this data object come from? What are the reproducible steps to create this particular data? Do any of the hints already suggested right there in the warning message help at all? It would really help if you showed us that you read the warning message and tried its hints at the time you ask the question.

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
  • 1
    Thanks Matt! The data had columns merged from data.frame. Even though its "class" was "data.table" "data.frame" after merger, I still had to reconvert it to data.table for the na.locf function to work properly. – Ye Tian Sep 04 '17 at 21:28
  • @MattDowle - here's a small example: https://stackoverflow.com/a/62776002/89706 – Ofek Shilon Jul 07 '20 at 14:00