Just to clear some stuff up for myself, I would like to better understand when copies are made and when they are not in data.table
. As this question points out Understanding exactly when a data.table is a reference to (vs a copy of) another data.table, if one simply runs the following then you end up modifying the original:
library(data.table)
DT <- data.table(a=c(1,2), b=c(11,12))
print(DT)
# a b
# [1,] 1 11
# [2,] 2 12
newDT <- DT # reference, not copy
newDT[1, a := 100] # modify new DT
print(DT) # DT is modified too.
# a b
# [1,] 100 11
# [2,] 2 12
However, if one does this (for example), then you end up modifying the new version:
DT = data.table(a=1:10)
DT
a
1: 1
2: 2
3: 3
4: 4
5: 5
6: 6
7: 7
8: 8
9: 9
10: 10
newDT = DT[a<11]
newDT
a
1: 1
2: 2
3: 3
4: 4
5: 5
6: 6
7: 7
8: 8
9: 9
10: 10
newDT[1:5,a:=0L]
newDT
a
1: 0
2: 0
3: 0
4: 0
5: 0
6: 6
7: 7
8: 8
9: 9
10: 10
DT
a
1: 1
2: 2
3: 3
4: 4
5: 5
6: 6
7: 7
8: 8
9: 9
10: 10
As I understand it, the reason this happens is because when you execute a i
statement, data.table
returns a whole new table as opposed to a reference to the memory occupied by the select elements of the old data.table
. Is this correct and true?
EDIT: sorry i meant i
not j
(changed this above)