0

Why does data.table delete columns from an original data.table when removing columns from a copy of that data.table? This really makes no sense to me.

Take the following example:

structure(list(pnum = c(7265873, 7266757, 7266757, 7268524, 7268524, 
7272620, 7272620, 7273253, 7273253, 7283628, 7283628, 7289442, 
7289442, 7289525, 7289525, 7289525, 7301987, 7301987, 7305259, 
7305259, 7307986, 7307986, 7310332, 7310332, 7333490, 7333490, 
7333502, 7333502, 7414991, 7414991), invid = c(24104, 38775, 
38776, 34281, 34282, 20002, 22284, 31921, 31922, 26841, 26843, 
17763, 17764, 38087, 38088, 38089, 34843, 38412, 32514, 33946, 
28587, 28588, 17204, 17205, 28587, 28588, 28587, 28588, 37008, 
37009), dom_st = c(0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 2, 2, 0, 
0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0), prim_st = c(1, 
2, 2, 3, 3, 1, 1, 1, 1, 1, 1, 11, 11, 1, 2, 3, 3, 3, 1, 1, 5, 
5, 3, 3, 5, 5, 5, 5, 3, 3), pat_st = c(1, 2, 2, 2, 2, 2, 2, 1, 
1, 1, 1, 48, 63, 1, 1, 1, 1, 1, 1, 1, 5, 5, 14, 14, 5, 5, 5, 
5, 1, 1), net_st = c(0, 3, 3, 2, 2, 0, 0, 4, 4, 2, 2, 10, 9, 
0, 0, 1, 2, 2, 0, 0, 2, 2, 4, 4, 2, 2, 2, 2, 0, 0)), .Names = c("pnum", 
"invid", "dom_st", "prim_st", "pat_st", "net_st"), class = c("data.table", 
"data.frame"), row.names = c(NA, -30L), .internal.selfref = <pointer: 0x0000000000230788>)

Say the above data.table is called DT1

Now if I do the following:

DT2 <- DT1
DT2[, prim_st:= NULL]

If I then check in DT1, the column 'prim_st' is gone as well.

Why? and how can I prevent this?

SJDS
  • 1,239
  • 1
  • 16
  • 31
  • 3
    Read the introduction tutorial to the data.table package. This is a consequence of the mechanisms that make it so efficient. You can use the 'copy' function to avoid this. – Roland Jun 09 '17 at 07:05
  • it's a feature of data.table. It has ample documentation. Hence no bug! – amonk Jun 09 '17 at 08:37

0 Answers0