0

I want to do a left join with 2 data.frames on R, using data.table library. What I have:

library(data.table)
id<-c("a1","a2","a3","a4")
id2<-c("a2","a3","a1","a4")
y<-c(1,2,3,4)
z<-c(3,5,6,7)
k<-c(1,3,8,7)

df1<-data.table(id,y,z)

id<-c("a2","a3","a1","a4")
df2<-data.table(id,k,y)

I want that the result is a new data.table frame, being this the result of a LEFT JOIN, this is:

result--> id,x,y,z

I use this as a guide: https://rstudio-pubs-static.s3.amazonaws.com/52230_5ae0d25125b544caab32f75f0360e775.html

merge(df1,df2,by="id",all.x=TRUE)

But this return me:

   id y.x z x y.y
1: a1   1 3 3   3
2: a2   2 5 0   1
3: a3   3 6 2   2
4: a4   4 7 1   4

The problem with this is that column y is duplicated, and I want that only appear once.

I have tried with all=FALSE, all.x=T,... but I dont achieve what I want.

I have also tried other solutions, as proposed in: left join in data.table

setkey(df1,id)
setkey(df2,id)
df1[df2]

But this again, duplicate the y column.

   id y z k i.y
1: a1 1 3 8   3
2: a2 2 5 1   1
3: a3 3 6 3   2
4: a4 4 7 7   4

How can I do it?

pogibas
  • 27,303
  • 19
  • 84
  • 117
Jesus
  • 462
  • 6
  • 13

1 Answers1

2

You can merge df1 and df2 by removing column y in one of the tables. Try dplyr::left_join(df1, df2[, -c("y")], by = "id") or merge(df1, df2[, -c("y")], by = "id").

nghauran
  • 6,648
  • 2
  • 20
  • 29
  • This is not a good solution because I have more than 2000 equal columns and doing it manually its hard!. Thanks anyway – Jesus Oct 10 '17 at 09:16
  • If columns are really identical between the tables, something like this should work `merge(df1, df2)` (like [here](https://stackoverflow.com/questions/22480836/merge-data-frame-but-keep-only-unique-columns), @thank Hardik gupta). An other option is to set a vector containing all the duplicated columns with `colnames(df1)[(colnames(df1) %in% colnames(df2))]` and then use this vector in the `merge` function (either in the `by` argument or to subset one of the tables). – nghauran Oct 10 '17 at 09:31