1

I am curious to know if the behavior of data.tablea with respect to environments is inconsistent. When working with data.tables, the expectation is that assigning a new variable to a data.table does not copy data but makes a new pointer to the existing table. This does not appear to be true true when the source data.table exists in another environment. For example,

> attach( new.env(), name="dt" )
> e <- as.environment("dt")
> 
> assign( "mydata", data.table( x=1:3, y=1 ), e)
> mydata
   x y
1: 1 1
2: 2 1
3: 3 1
> ls()
[1] "e"       

If we try and assign a new name to mydata, we don't get the expected behavior of having a pointer to the same data.

mydata2 <- mydata     # also makes a _copy_
mydata2[['y']] <- 5   # change the data
identical( mydata2, mydata )  
> FALSE

mydata2 does not point to (the same value) as mydata. It has made a copy. This is not what I would have expected from data.table. I expect data.tables to behave more as singletons in which only one copy of the data exists unless an explicit copy is made.

In addition, $<- and [[<- cause copies to be made on the global environment. $<<- and [[<<- do not (as expected). Also, := does not cause a copy to be made.

Is this inconsistent with respect to data.table's intent?

Is this behavior inconsistent with data.table?

R Version Information:

R.version _
platform x86_64-unknown-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 0.1
year 2013
month 05
day 16
svn rev 62743
language R
version.string R version 3.0.1 (2013-05-16) nickname Good Sport

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
ctbrown
  • 2,271
  • 17
  • 24
  • 1
    `<-` isn't any different for `data.table`. You need to use the `:=` operator and `set*` functions that `data.table` provides to do things by reference. See also http://stackoverflow.com/questions/10225098/understanding-exactly-when-a-data-table-is-a-reference-to-vs-a-copy-of-another and `?":="`. Your expectations are correct so long as you use `:=` and `set*`. – Matt Dowle Mar 06 '14 at 16:40
  • Oh, I get it now. Thanks for the clarity and link to the previous answer regarding DT behavior. I would say that the behavior is a little inconsistent/unexpected, though I recognize it a limitation that you had to work with. Great package BTW. – ctbrown Mar 06 '14 at 19:27

1 Answers1

2

The operators <- and = do not copy in R for all objects:

a = c(1:10)
.Internal(inspect(a))
#@0x000000001072aa28 13 INTSXP g0c4 [NAM(1)] (len=10, tl=0) 1,2,3,4,5,...
b = a
.Internal(inspect(b))
#@0x000000001072aa28 13 INTSXP g0c4 [NAM(2)] (len=10, tl=0) 1,2,3,4,5,...

And neither is your mydata copied when you do mydata2 <- mydata (you can check this again using the above method, or by trying smth like mydata[, y := 5] right after the assignment and seeing how that changes both tables).

On the other hand [[<- and the plethora of other assignment operators do copy for both data.frame and data.table (and that's what you see) and the way to modify data.table by reference is to use :=. None of the environment stuff above matters.

eddi
  • 49,088
  • 6
  • 104
  • 155