0

This is probably simple but I don't know where the problem is.

I have two R data frames that I am trying to left-merge using data.table. Claims data has 100,000 unique IDs and ID data has 60,000 unique IDs.

Claims[ID, on=id] 

However after the merge I am getting 100,000 unique IDs. Isn't this the code for a left merge?

When I try ID[Claims, on=id], I get 60,000 unique IDs. But this is the code for right merge.

I just want to say that I am a beginner R learner and this question may seem too pedestrian. Please give me some inputs on what is the correct code for a left join so that I am left with the 60k unique IDs from my 'ID' data..

Giving sample code-

id <- data.table(Id  = c("A", "B", "C", "C"),
                X1  = c(1L, 3L, 5L, 7L),
                XY  = c("x2", "x4", "x6", "x8")
               )

claims <- data.table(Id  = c("A", "B", "B", "D","E"),
                Y1  = c(1L, 3L, 5L, 7L,8L),
                XY  = c("y1", "y3", "y5", "y7","y9"))

m <- claims[id, on = "Id"]

length(unique(m$id)) #gives 0 
length(unique(claims$id)) #gives 0 
length(unique(id$id)) #gives 0 
Pam
  • 111
  • 6
  • Please provide a reproducible example. – daniellga Aug 15 '20 at 14:09
  • 1
    Please read the posting instructions at the top of the [tag:r] tag page. In particular questions need a minimal reproducible example which means cut the problem down to a small example inlcluding all code and inputs so that anyone can copy it and paste it into their session and see what you see. – G. Grothendieck Aug 15 '20 at 14:10
  • Just gave sample code. Thanks. – Pam Aug 15 '20 at 16:00
  • `id` is not the same as `Id`. – Henrik Aug 15 '20 at 16:07
  • Gosh! That is right. I am getting used to the case-sensitive variable names. I am coming from SAS. Thanks! – Pam Aug 15 '20 at 19:53
  • As a fellow SAS programmer who also had the data table learning curve, you have my sympathy. Admittedly, a tangent: If you were still in SAS, you'd be getting the "NOTE: MERGE statement has more than one data set with repeats of BY values." message, based on these two data sets. :-). This leads me to ask why you are mentioning unique IDs, since neither table has a unique index. I encourage you to look into using keys on your data tables. Although it is not required, typically in a left join situation. you will prefer to have no duplicate keys. – Michael Tuchman Aug 18 '20 at 04:41
  • Your code is correct, as written. And the case sensitive issues will drive you nuts. – Michael Tuchman Aug 18 '20 at 04:49

0 Answers0