I needed to assign a "second" id to group some values inside my original id
. this is my sample data:
dt<-structure(list(id = c("aaaa", "aaaa", "aaas", "aaas", "bbbb", "bbbb"),
period = c("start", "end", "start", "end", "start", "end"),
date = structure(c(15401L, 15401L, 15581L, 15762L, 15430L, 15747L), class = c("IDate", "Date"))),
class = c("data.table", "data.frame"),
.Names = c("id", "period", "date"),
sorted = "id")
> dt
id period date
1: aaaa start 2012-03-02
2: aaaa end 2012-03-05
3: aaas start 2012-08-21
4: aaas end 2013-02-25
5: bbbb start 2012-03-31
6: bbbb end 2013-02-11
column id
needs to be grouped (using the same value in say id2
) according to this list:
> groups
[[1]]
[1] "aaaa" "aaas"
[[2]]
[1] "bbbb"
I used the following code, which seems to work by gives the following warning
:
> dt[, id2 := which(vapply(groups, function(x,y) any(x==y), .BY[[1]], FUN.VALUE=T)), by=id]
Warning message:
In `[.data.table`(dt, , `:=`(id2, which(vapply(groups, function(x, :
Invalid .internal.selfref detected and fixed by taking a copy of the whole table,
so that := can add this new column by reference. At an earlier point, this data.table has
been copied by R (or been created manually using structure() or similar). Avoid key<-,
names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use
set* syntax instead to avoid copying: setkey(), setnames() and setattr(). Also,
list (DT1,DT2) will copy the entire DT1 and DT2 (R's list() copies named objects),
use reflist() instead if needed (to be implemented). If this message doesn't help,
please report to datatable-help so the root cause can be fixed.
> dt
id period date id2
1: aaaa start 2012-03-02 1
2: aaaa end 2012-03-02 1
3: aaas start 2012-08-29 1
4: aaas end 2013-02-26 1
5: bbbb start 2012-03-31 2
6: bbbb end 2013-02-11 2
could someone briefly explain the nature of this warning and any eventual implication in the final results (if any)? thanks
EDIT:
the follwing code is actually showing when dt
is created and how is passed to the function that gives the warning:
f.main <- function(){
f2 <- function(x){
groups <- list(c("aaaa", "aaas"), "bbbb") # actually generated depending on the similarity between values of x$id
x <- x[, id2 := which(vapply(groups, function(x,y) any(x==y), .BY[[1]], FUN.VALUE=T)), by=id]
return(x)
}
x <- f1()
if(!is.null(x[["res"]])){
x <- f2(x[["res"]])
return(x)
} else {
# something else
}
}
f1 <- function(){
dt<-data.table(id = c("aaaa", "aaaa", "aaas", "aaas", "bbbb", "bbbb"),
period = c("start", "end", "start", "end", "start", "end"),
date = structure(c(15401L, 15401L, 15581L, 15762L, 15430L, 15747L), class = c("IDate", "Date")))
return(list(res=dt, other_results=""))
}
> f.main()
id period date id2
1: aaaa start 2012-03-02 1
2: aaaa end 2012-03-02 1
3: aaas start 2012-08-29 1
4: aaas end 2013-02-26 1
5: bbbb start 2012-03-31 2
6: bbbb end 2013-02-11 2
Warning message:
In `[.data.table`(x, , `:=`(id2, which(vapply(groups, function(x, :
Invalid .internal.selfref detected and fixed by taking a copy of the whole table,
so that := can add this new column by reference. At an earlier point, this data.table
has been copied by R (or been created manually using structure() or similar).
Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole
data.table. Use set* syntax instead to avoid copying: setkey(), setnames() and setattr().
Also, list(DT1,DT2) will copy the entire DT1 and DT2 (R's list() copies named objects),
use reflist() instead if needed (to be implemented). If this message doesn't help,
please report to datatable-help so the root cause can be fixed.