I experience some unexpected behavior when using grouped modification of a column in a data.table:
# creating a data.frame
data <- data.frame(sequence = rep(c("A","B","C","D"), c(2,3,3,2)), trim = 0, random_value = NA)
data[c(1:4, 10), "trim"] <- 1
# copying data to data_temp
data_temp <- data
# assigning some random value to data_temp so that it should no longer be a
# copy of "data"
data_temp[1, "random_value"] <- rnorm(1)
# converting data_temp to data.table
setDT(data_temp)
# expanding trim parameter to group and subsetting
data_temp <- data_temp[, trim := sum(trim), by = sequence][trim == 0]
data_temp comes out as expected with only the "C" sequence entries remaining. However, I would also expect the "data" object to remain unchanged. This is not the case. The "data" object looks as follows:
sequence trim random_value
1 A 2 NA
2 A 2 NA
3 B 2 NA
4 B 2 NA
5 B 2 NA
6 C 0 NA
7 C 0 NA
8 C 0 NA
9 D 1 NA
10 D 1 NA
So the assignment by reference of the "trim" variable also happened in the original data.frame.
I am using data.table_1.11.4 and R version 3.4.3 for compatibility reasons.
Is the error a result of using old versions or am I doing something wrong / do I need to change the code to avoid that error?