0

I'm encountering a quite unexpected behavior when creating new columns inside a function that takes as input a data.table: the function unexpectedly alters the input data.table. Take the simple case:

DTest<-data.table(v1=rnorm(10,0,1),v2=rnorm(10,0,1))

test<-function(DT){
  DT[,var0:=rnorm(.N,0,1)]
  DT[,var1:=numeric()]
  return(DT)
}

DTest
             v1         v2
 1:  0.004911561  0.3054059
 2:  0.370564395  0.8336796
 3:  0.860755880  0.1052963
 4:  1.252397542 -0.0401276
 5:  0.372725388  1.0474662
 6: -0.090960500  1.2666136
 7: -1.457178835 -0.6966777
 8:  0.195528018 -0.4050465
 9: -0.131193864 -0.8281367
10: -0.769164801  0.3034279

a<-test(DTest)
DTest
              v1         v2        var0 var1
 1:  0.004911561  0.3054059  0.48903710   NA
 2:  0.370564395  0.8336796 -0.06011728   NA
 3:  0.860755880  0.1052963 -0.46971666   NA
 4:  1.252397542 -0.0401276 -0.63927446   NA
 5:  0.372725388  1.0474662 -0.48513926   NA
 6: -0.090960500  1.2666136 -1.38466919   NA
 7: -1.457178835 -0.6966777  0.17275922   NA
 8:  0.195528018 -0.4050465 -1.13829455   NA
 9: -0.131193864 -0.8281367  0.50847027   NA
10: -0.769164801  0.3034279  0.65679337   NA

a
             v1         v2        var0 var1
 1:  0.004911561  0.3054059  0.48903710   NA
 2:  0.370564395  0.8336796 -0.06011728   NA
 3:  0.860755880  0.1052963 -0.46971666   NA
 4:  1.252397542 -0.0401276 -0.63927446   NA
 5:  0.372725388  1.0474662 -0.48513926   NA
 6: -0.090960500  1.2666136 -1.38466919   NA
 7: -1.457178835 -0.6966777  0.17275922   NA
 8:  0.195528018 -0.4050465 -1.13829455   NA
 9: -0.131193864 -0.8281367  0.50847027   NA
10: -0.769164801  0.3034279  0.65679337   NA

The function should not be altering the input DTest, should it? If it should, how can I avoid it? Interesting enough, although when I view DTest or get the dimensions of the DTests I get 4 columns, the Environment window on RStudio states 2 variables for DTests and correctly states 4 for a.

dim(DTest)
[1] 10  4

data.table version: 1.12.8

Note that I've created a similar function but using a data.frame as input and everything goes as expected.

EdM
  • 164
  • 7

1 Answers1

1

Here is a way to avoid that behaviour:

test <- function(DT) {
  ODT <- copy(DT)
  ODT[,var0:=rnorm(.N,0,1)]
  ODT[,var1:=numeric()][]
}
s_baldur
  • 29,441
  • 4
  • 36
  • 69