0

While trying to replace NA's in a data.frame I discover a weird behaviour of data.table::set(). I'm not used to the data.table syntax, but set() is supposed to work for data.frame as well.

The following code shows the problem:

set.seed(123)
data <- data.frame(replicate(5,sample(c(1,NA),5,rep=TRUE)))

dt <- as.data.table(data)
dt_red <- dt[-1, -2]

df <- data
df_red <- df[-1, -2]

sum(is.na(data))
sum(is.na(dt))
sum(is.na(dt_red))
sum(is.na(df))
sum(is.na(df_red))

ind <- as.integer(c(2,3))
for (j in ind)
  set(data,which(is.na(data[[j]])),j,0) 

sum(is.na(data))
sum(is.na(dt))
sum(is.na(dt_red))
sum(is.na(df))
sum(is.na(df_red))

The function is supposed to replace NA's in column's 2 and 3 from data with 0, which it does. It also replaces the NA's in df, which it really shouldn't. if data is safed as data.table or modified, nothing is changed...

Any ideas?

P.S. The post is not about how to change the script to make it work (I did that), but to help me realise how set() is allowed to change df, while only be given data to work with?

  • `data.table` work with references (https://stackoverflow.com/questions/10225098/understanding-exactly-when-a-data-table-is-a-reference-to-vs-a-copy-of-another). If you don't want any modification on `data`, you should use `df <- copy(data)` – denis Sep 19 '18 at 09:34
  • relevant: http://r.789695.n4.nabble.com/Confused-about-NAMED-td4103326.html and https://stackoverflow.com/questions/10655287/r-deep-vs-shallow-copies-pass-by-reference. your qn is related to when does R makes a deep copy of data.frame. try running `tracemem(df)` and `tracemem(data)` right after `df <- data` – chinsoon12 Sep 19 '18 at 09:44

0 Answers0