I'm encountering a quite unexpected behavior when creating new columns inside a function that takes as input a data.table: the function unexpectedly alters the input data.table. Take the simple case:
DTest<-data.table(v1=rnorm(10,0,1),v2=rnorm(10,0,1))
test<-function(DT){
DT[,var0:=rnorm(.N,0,1)]
DT[,var1:=numeric()]
return(DT)
}
DTest
v1 v2
1: 0.004911561 0.3054059
2: 0.370564395 0.8336796
3: 0.860755880 0.1052963
4: 1.252397542 -0.0401276
5: 0.372725388 1.0474662
6: -0.090960500 1.2666136
7: -1.457178835 -0.6966777
8: 0.195528018 -0.4050465
9: -0.131193864 -0.8281367
10: -0.769164801 0.3034279
a<-test(DTest)
DTest
v1 v2 var0 var1
1: 0.004911561 0.3054059 0.48903710 NA
2: 0.370564395 0.8336796 -0.06011728 NA
3: 0.860755880 0.1052963 -0.46971666 NA
4: 1.252397542 -0.0401276 -0.63927446 NA
5: 0.372725388 1.0474662 -0.48513926 NA
6: -0.090960500 1.2666136 -1.38466919 NA
7: -1.457178835 -0.6966777 0.17275922 NA
8: 0.195528018 -0.4050465 -1.13829455 NA
9: -0.131193864 -0.8281367 0.50847027 NA
10: -0.769164801 0.3034279 0.65679337 NA
a
v1 v2 var0 var1
1: 0.004911561 0.3054059 0.48903710 NA
2: 0.370564395 0.8336796 -0.06011728 NA
3: 0.860755880 0.1052963 -0.46971666 NA
4: 1.252397542 -0.0401276 -0.63927446 NA
5: 0.372725388 1.0474662 -0.48513926 NA
6: -0.090960500 1.2666136 -1.38466919 NA
7: -1.457178835 -0.6966777 0.17275922 NA
8: 0.195528018 -0.4050465 -1.13829455 NA
9: -0.131193864 -0.8281367 0.50847027 NA
10: -0.769164801 0.3034279 0.65679337 NA
The function should not be altering the input DTest, should it? If it should, how can I avoid it? Interesting enough, although when I view DTest or get the dimensions of the DTests I get 4 columns, the Environment window on RStudio states 2 variables for DTests and correctly states 4 for a.
dim(DTest)
[1] 10 4
data.table version: 1.12.8
Note that I've created a similar function but using a data.frame as input and everything goes as expected.