3

This question is not a duplicate of Error in setDT from data.table package.

library(data.table)
dt <- iris
str(iris) # a data.frame solely
setDT(dt)
str(iris) # data.frame and data.table

Why should setDT() act on an object that is not its argument?

Thank you all for pointing out why iris is changed in parallel with dt. Unless one knew the answer already, there would be no way to know that the question was a duplicate.

Robert Hadow
  • 457
  • 4
  • 15
  • Related (*dup*): [When should I use setDT() instead of data.table() to create a data.table?](https://stackoverflow.com/questions/41917887/when-should-i-use-setdt-instead-of-data-table-to-create-a-data-table) – pogibas Jul 09 '18 at 19:22
  • Related-ish: [Understanding exactly when a data.table is a reference to (vs a copy of) another data.table](https://stackoverflow.com/questions/10225098/understanding-exactly-when-a-data-table-is-a-reference-to-vs-a-copy-of-another) – Henrik Jul 09 '18 at 19:31

1 Answers1

4

Your objects iris and dt have the same address in memory. You can make a real copy by dt <- data.table::copy(iris).

Consider this:

dt <- iris
> tracemem(iris) == tracemem(dt)
[1] TRUE

but

dt <- data.table::copy(iris)
> tracemem(iris) == tracemem(dt)
[1] FALSE

Reason

?data.table::setDT says:

When working on large lists or data.frames, it might be both time and memory consuming to convert them to a data.table using as.data.table(.), as this will make a complete copy of the input object before to convert it to a data.table. The setDT function takes care of this issue by allowing to convert lists - both named and unnamed lists and data.frames by reference instead. That is, the input object is modified in place, no copy is being made.

jay.sf
  • 60,139
  • 8
  • 53
  • 110