2

I am doing this:

myfun <- function(inputvar_vec){
# inputvar_vec is input vector
# do something
# result = output vector
return(result)
}

DT[, result := lapply(.SD, myfun), by = byvar, .SDcols = inputvar]

I am getting following warning:

Warning message:
`In `[.data.table`(df1, , `:=`(prop, lapply(.SD, propEventInLastK)),  :
Invalid .internal.selfref detected and fixed by taking a copy of the whole table, 
so that     := can add this new column by reference. At an earlier point, this 
data.table has been copied by R (or been created manually using structure() 
or similar). (and then some more stuff) .... `

My guess is because I am stacking up result vectors (after a by operation), a copy is being made?

Can anyone suggest a method to remove this warning? I have done this using apply functions and thought it should be extendable here too.

My other question is: can you pass a chunk of rows from a data frame (subsetted by using by statement), and then call a function myfun to operate on that?

ADDING EXAMPLE AS REQUESTED

# generate data
N = 10000
default=NA
value = 1
df = data.table(id = sample(1:5000, N, replace=TRUE),
                trial = sample(c(0,1,2), N, replace=TRUE),
                ts = sample(1:200, N, replace=TRUE))

#set keys
setkeyv(df, c("id", "ts"))

df[["trial"]] = as.numeric(df[["trial"]]==value)

testfun <- function(x){
  L=length(x)
  x = x[L:1]
  x = fts(data=x)
  y = rep(default, L)
  if(L>=K){
    y1 = as.numeric(moving.sum(x,K))
    y = c(y1, rep(default,L-length(y1)))
  } 
  return(y[L:1]/K)
}

df[, prop:= lapply(.SD, testfun), by = id, .SDcols = "trial"]

Still getting the same warning message:

Warning message:
In `[.data.table`(df, , `:=`(prop, lapply(.SD, testfun)), by = id,  :
  Invalid .internal.selfref detected and fixed by taking a copy of the whole table, so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or been created manually using structure() or similar). Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: setkey(), setnames() and setattr(). Also, list(DT1,DT2) will copy the entire DT1 and DT2 (R's list() copies named objects), use reflist() instead if needed (to be implemented). If this message doesn't help, please report to datatable-help so the root cause can be fixed.
user1971988
  • 845
  • 7
  • 22
  • 1
    How did you create the data.table? What exactly is `do something`? Please provide a [reproducible example](http://stackoverflow.com/a/5963610/1412059). – Roland Jul 31 '13 at 07:41
  • Apologies. It is a dataset with millions of rows. Please give me some time. I will post a reproducible example soon. The data table was created as: DT = data.table(readRDS(fileAsDataFrame)) – user1971988 Jul 31 '13 at 07:52
  • Have you tried `DF <- readRDS(fileAsDataFrame); DT <- data.table(DF)` to rule out that this step creates the problem? – Roland Jul 31 '13 at 08:07
  • @Roland - Yes. That did not fix the problem. – user1971988 Aug 02 '13 at 03:20

1 Answers1

3

The issue arises in

df[["trial"]] = as.numeric(df[["trial"]]==value)

Which is not a data.table approach

A data.table approach would be to use :=

 df[, trial := as.numeric(trial == value)]

should avoid the issue.

Understanding why copies are made (and thus internal self references may be voided) see Understanding exactly when a data.table is a reference to (vs a copy of) another data.table

It is important to realize that there is no [[<- method for data.tables and thus [[<-.data.frame is called, which will copy the entire object and moreover does not do any of the careful things that a data.table method (such as [<-.data.table) does (returning a valid data.table.

Community
  • 1
  • 1
mnel
  • 113,303
  • 27
  • 265
  • 254
  • Yes, that solves the problem. Thank you very much. I still see it in a list format, which is why I am struggling to come out of the list based commands. Guess have to change that. – user1971988 Aug 02 '13 at 03:37
  • Can you point me to something that will help me understand why the copying warning was showing up? Was that statement being construed as: `.internal.selfref ` – user1971988 Aug 02 '13 at 03:49
  • 1
    @user1971988 see my edit and Read the package vignettes as well -- they are brilliant and illuminating – mnel Aug 02 '13 at 03:58