1

This question about data.table has 2 parts... :

First, the disappearing row.names in data.table. See code below: converting a dataframe to a data.table zaps the row.names. But even after adding the row.names as a field they are zapped during the conversion. What am I doing wrong?

Second, the communicating data.tables. See code below: if I make a new data.table, the old and the new one seems to be communicating. In other words, they are different tables, but updating table 1 also updates table 2. What am I doing wrong?

library(data.table)
library(stringr)

# part 1 - the zapped row.names...
data(mtcars)
dt=mtcars
dt$cars=row.names(dt) # add row.names as field
cars=dt$cars          # stores field as vector, as next step will zap it
dt=data.table(mtcars) # zaps field "cars"...
dt=cbind(dt,cars)

# part 2 - the communicating data.tables...
dt1=dt # make a new table
dt1[,cars:=str_replace(cars,"Valiant","Thingy")] # change something in the table
# now *both* tables have changed...

# try with data.frame
df=mtcars
df$cars=row.names(df)

df1=df
df1=transform(df1,cars=str_replace(cars,"Valiant","Thingy")) # works as expected
# now only df1  has changed. 
Henk
  • 3,634
  • 5
  • 28
  • 54
  • You've asked two questions in one here so it's difficult to deal with. The dup is more for part 2. Part 1 could be a new question but what precisely do you mean by "zap" ? – Matt Dowle Jan 23 '14 at 11:18
  • Thanks for pointing me to that link. It is quite a bit of text. The summary is: use dt1=copy(dt) instead of dt1=dt. – Henk Jan 23 '14 at 11:28
  • The link does not provide insight [for me] on the first part of my question: the zapped row.names. With "zapped" i mean that there were row.names or a field, and that they disappeared after the data.table conversion. Sorry for asking two questions in one, indeed not very clever of me. – Henk Jan 23 '14 at 11:29
  • 1
    why are you storing the column of row.names in `dt` and then doing `dt <- data.table(mtcars)`? Shouldn't you be doing `dt <- data.table(dt)`? `mtcars` does not have the column `cars`... – Arun Jan 23 '14 at 11:34
  • @Arun: If I do data.table(cars), the row.names disappear. Therefore I add the cars column manually. But they disappear as well when I convert to data.table - that is the part I don't get. – Henk Jan 23 '14 at 12:22
  • 1
    You add `cars` to `dt` and then convert `mtcars` to a `data.table`. But `mtcars` doesn't have a column called `cars`... This is a very silly issue you've overlooked in your code. Please go through your code again. – Arun Jan 23 '14 at 13:23

1 Answers1

4

Part 1

You have an error in your code.

You use dt=data.table(mtcars), where based on your description
you meant to use: dt=data.table(dt) or dt=data.table(mtcars, cars)

In other words, as @Arun pointed out in the comments, you convert mtcars to a new data.table named dt, then modify dt. You never made any modifications to mtcars.

Part 2

You have only taken a shallow copy. Have a look at ?copy and the other question that Matt pointed you to.
dt2 <- copy(dt1)

Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
  • 3
    +1 Just to pick up that `dt1=dt` just points the `dt1` symbol at the same data `dt` is pointing to (no shallow copy there). `shallow()` is where the vector of column pointers is copied but then each slot of `dt1` pointed to the same column vectors in `dt`. – Matt Dowle Jan 23 '14 at 14:22