data.table merging and removing zeros in R

Question

i have two large datasets and merged them on a uniqueID:

df3 <- merge(df1, df2, by = "ID", all.x = TRUE)

is this the same as df1[df2, on = "ID"] ?

How can i replace NAs with 0 in a single column in df3? df2 only has two columns: value and ID. By having all.x = TRUE a lot of NAs are introduced in the column value. I want to replace them with zero, but keep getting errors. How can this be done?

Please make a reproducible example. – s_baldur May 19 '20 at 10:49 — s_baldur, May 19 '20 at 10:49

score 0 · Accepted Answer · answered May 19 '20 at 11:16

Did it look the same when you tried it?

Answer: No.

Since you didn't give an example, I will try.

library(data.table) 
df1 <- data.table(ID=2:5,a=1:4)
df2 <- data.table(ID=3:7,b=11:15)
df1
#    ID a
# 1:  2 1
# 2:  3 2
# 3:  4 3
# 4:  5 4

df2
#    ID  b
# 1:  3 11
# 2:  4 12
# 3:  5 13
# 4:  6 14
# 5:  7 15

merge(df1,df2,by="ID",all.x=TRUE)
#    ID a  b
# 1:  2 1 NA
# 2:  3 2 11
# 3:  4 3 12
# 4:  5 4 13

df1[df2,on="ID"]
#    ID  a  b
# 1:  3  2 11
# 2:  4  3 12
# 3:  5  4 13
# 4:  6 NA 14
# 5:  7 NA 15

They are not the same. df1[df2 is a right join, keeping all of df2. The merge has all.x=TRUE which keeps all of df1.

This is the same as your merge:

df2[df1,on="ID"]
#    ID  b a
# 1:  2 NA 1
# 2:  3 11 2
# 3:  4 12 3
# 4:  5 13 4

And this is the merge the other way:

merge(df1,df2,by="ID",all.y=TRUE)
#    ID  a  b
# 1:  3  2 11
# 2:  4  3 12
# 3:  5  4 13
# 4:  6 NA 14
# 5:  7 NA 15

And this changes NA to 0:

df3 <- df2[df1,on="ID"][is.na(b),b:=0] 
df3
#    ID  b a
# 1:  2  0 1
# 2:  3 11 2
# 3:  4 12 3
# 4:  5 13 4

data.table merging and removing zeros in R

1 Answers1