I am doing this:
myfun <- function(inputvar_vec){
# inputvar_vec is input vector
# do something
# result = output vector
return(result)
}
DT[, result := lapply(.SD, myfun), by = byvar, .SDcols = inputvar]
I am getting following warning:
Warning message:
`In `[.data.table`(df1, , `:=`(prop, lapply(.SD, propEventInLastK)), :
Invalid .internal.selfref detected and fixed by taking a copy of the whole table,
so that := can add this new column by reference. At an earlier point, this
data.table has been copied by R (or been created manually using structure()
or similar). (and then some more stuff) .... `
My guess is because I am stacking up result
vectors (after a by operation), a copy is being made?
Can anyone suggest a method to remove this warning? I have done this using apply functions and thought it should be extendable here too.
My other question is: can you pass a chunk of rows from a data frame (subsetted by using by statement), and then call a function myfun to operate on that?
ADDING EXAMPLE AS REQUESTED
# generate data
N = 10000
default=NA
value = 1
df = data.table(id = sample(1:5000, N, replace=TRUE),
trial = sample(c(0,1,2), N, replace=TRUE),
ts = sample(1:200, N, replace=TRUE))
#set keys
setkeyv(df, c("id", "ts"))
df[["trial"]] = as.numeric(df[["trial"]]==value)
testfun <- function(x){
L=length(x)
x = x[L:1]
x = fts(data=x)
y = rep(default, L)
if(L>=K){
y1 = as.numeric(moving.sum(x,K))
y = c(y1, rep(default,L-length(y1)))
}
return(y[L:1]/K)
}
df[, prop:= lapply(.SD, testfun), by = id, .SDcols = "trial"]
Still getting the same warning message:
Warning message:
In `[.data.table`(df, , `:=`(prop, lapply(.SD, testfun)), by = id, :
Invalid .internal.selfref detected and fixed by taking a copy of the whole table, so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or been created manually using structure() or similar). Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: setkey(), setnames() and setattr(). Also, list(DT1,DT2) will copy the entire DT1 and DT2 (R's list() copies named objects), use reflist() instead if needed (to be implemented). If this message doesn't help, please report to datatable-help so the root cause can be fixed.