While trying to replace NA's in a data.frame
I discover a weird behaviour of data.table::set()
. I'm not used to the data.table
syntax, but set()
is supposed to work for data.frame
as well.
The following code shows the problem:
set.seed(123)
data <- data.frame(replicate(5,sample(c(1,NA),5,rep=TRUE)))
dt <- as.data.table(data)
dt_red <- dt[-1, -2]
df <- data
df_red <- df[-1, -2]
sum(is.na(data))
sum(is.na(dt))
sum(is.na(dt_red))
sum(is.na(df))
sum(is.na(df_red))
ind <- as.integer(c(2,3))
for (j in ind)
set(data,which(is.na(data[[j]])),j,0)
sum(is.na(data))
sum(is.na(dt))
sum(is.na(dt_red))
sum(is.na(df))
sum(is.na(df_red))
The function is supposed to replace NA's in column's 2 and 3 from data with 0, which it does. It also replaces the NA's in df, which it really shouldn't. if data is safed as data.table or modified, nothing is changed...
Any ideas?
P.S.
The post is not about how to change the script to make it work (I did that), but to help me realise how set()
is allowed to change df
, while only be given data
to work with?