-1

I defined a function try_1 whose aim is to calculate the output according to two input data sets.

library(data.table)

# the function
try_1 <- function(in_a, in_b){
  in_b[, `:=`(Vb = Vb/1000)]
  tmp <- in_a
  tmp[, value := Va/in_b$Vb]   
  return(tmp)
}

dt_a <- data.table(Va = c(2, 5))
dt_b <- data.table(Vb = c(1000, 2000))

# run for the 1st time
dt <- try_1(dt_a, dt_b)

# run for the 2nd time
dt <- try_1(dt_a, dt_b)

# run for the 3rd time
dt <- try_1(dt_a, dt_b)

If I run the function only once, the output dt is as expected, i.e.,

   Va value
1:  2   2.0
2:  5   2.5

However, if I run it the second time, dt changes! (I thought it should be the same as the first run since the sentence didn't change). The value is 1000 times larger.

   Va value
1:  2  2000
2:  5  2500

If I further ran the third time, dt changes again:

   Va   value
1:  2 2000000
2:  5 2500000

Could anyone tell me what causes this problem? Why doe different results occur under multiple runs?

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
T X
  • 505
  • 1
  • 9
  • 19
  • 1
    I suspect it has to do with this line: `tmp <- in_a` You're making a copy of a `data.table` which I think modifies it even though this should be local. – NelsonGon Nov 11 '21 at 10:22
  • 2
    You use `:=`, “assignment by reference.” .. you mutate your dt_b object on each run... – dario Nov 11 '21 at 10:24

1 Answers1

1

data.tables don't copy the data, just a reference to the data.table. tmp[, value := Va/in_b$Vb] therefore changes the data in in_a.

This is explained extensively in this question.

tstenner
  • 10,080
  • 10
  • 57
  • 92