This has been bothering me for quite a while, and I think there's something that I just don't understand about the data.table package. If I want two "slices" of my data.table, and rename one to a new name, if I forget to enter "copy", then data.table treats the two objects with new names as the same thing.
- Is the only systematic way to solve this issue by using the "copy" function? i.e. if I don't use "copy", will I always get this behavior of treating the same objects as the same thing?
- What's the purpose of this feature? Something to do with memory storage..? It seems like it can cause some serious inadvertent errors if I decide to change one DT and then use the original object. Also, if a new data.table user is coming from base R, and doesn't know about this behavior, then there will be some systematic problems with all their code.
- What's the point of the setDT function if it doesn't actually "set" the data table into a new object?
Here's an illustrative example:
library(data.table)
#####BOTH_SETDT & COPY#####
first_dt <- data.frame(a = c(1,2,3), b = c(9,8,7))
setDT(first_dt)
second_dt <- copy(first_dt)
setDT(second_dt)
first_dt[,a:=a/50]
second_dt[,b:=b/50]
print(first_dt)
# a b
#1: 0.02 9
#2: 0.04 8
#3: 0.06 7
print(second_dt)
# a b
#1: 1 0.18
#2: 2 0.16
#3: 3 0.14
#####BOTH SETDT#####
first_dt <- data.frame(a = c(1,2,3), b = c(9,8,7))
setDT(first_dt)
second_dt <- first_dt
setDT(second_dt)
first_dt[,a:=a/50]
second_dt[,b:=b/50]
print(first_dt)
# a b
#1: 0.02 0.18
#2: 0.04 0.16
#3: 0.06 0.14
print(second_dt)
# a b
#1: 0.02 0.18
#2: 0.04 0.16
#3: 0.06 0.14
#####SINGLE SETDT#####
first_dt <- data.frame(a = c(1,2,3), b = c(9,8,7))
setDT(first_dt)
second_dt <- first_dt
first_dt[,a:=a/50]
second_dt[,b:=b/50]
print(first_dt)
# a b
#1: 0.02 0.18
#2: 0.04 0.16
#3: 0.06 0.14
print(second_dt)
# a b
#1: 0.02 0.18
#2: 0.04 0.16
#3: 0.06 0.14
#####AS.DATA.TABLE#####
first_dt <- as.data.table(data.frame(a = c(1,2,3), b = c(9,8,7)))
second_dt <- first_dt
first_dt[,a:=a/50]
second_dt[,b:=b/50]
print(first_dt)
# a b
#1: 0.02 0.18
#2: 0.04 0.16
#3: 0.06 0.14
print(second_dt)
# a b
#1: 0.02 0.18
#2: 0.04 0.16
#3: 0.06 0.14
#####AS.DATA.TABLE WITH JUST COPY#####
first_dt <- as.data.table(data.frame(a = c(1,2,3), b = c(9,8,7)))
second_dt <- copy(first_dt)
first_dt[,a:=a/50]
second_dt[,b:=b/50]
print(first_dt)
# a b
#1: 0.02 9
#2: 0.04 8
#3: 0.06 7
print(second_dt)
# a b
#1: 1 0.18
#2: 2 0.16
#3: 3 0.14
#####ANOTHER AS.DATA.TABLE#####
first_dt <- (data.frame(a = c(1,2,3), b = c(9,8,7)))
second_dt <- as.data.table(first_dt)
first_dt <- as.data.table(first_dt)
first_dt[,a:=a/50]
second_dt[,b:=b/50]
print(first_dt)
print(second_dt)