1

I have two dataframe with different columns that has large number of rows (about 2 million)

The first one is df1

enter image description here

The second one is df2

enter image description here

I need to get match the values in y column from table one to R column in table two Example: see the two rows in df1 in red box have matched the two rows in df2 in red box enter image description here

Then I need to get the score of the matched values

so the result should look like this and it should be stores in a dataframe:

enter image description here

My attempt : first Im beginner in R, so when I searched I found that I can use Match function, merge function but I did not get the result that I want it might because I did not know how to use them correctly, therefore, I need step by step very simple solution

user8863554
  • 167
  • 1
  • 12
  • Please don't post images of data. Code-formatted print results are ok; [`dput` results are better](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). – alistaire Mar 07 '18 at 04:02

2 Answers2

2

We can use match from base R

df2[match(df2$R, df1$y, nomatch = 0), c("R", "score")]
#   R score
#3   2     3
#4 111     4

Or another option is semi_join from dplyr

library(dplyr)
semi_join(df2[-1], df1, by = c(R = "y"))
#    R score
#1   2     3
#2 111     4
akrun
  • 874,273
  • 37
  • 540
  • 662
  • thanks, that worked but I have another question how do I store the UNmathced ones and get there scores ? (meaning the opposite of what we did ) – user8863554 Mar 07 '18 at 07:02
  • @user8863554 For that you need `anti_join` i.e. `anti_join(df2[-1], df1, by = c(R = "y"))` – akrun Mar 07 '18 at 07:03
  • one last questions, the values in R column could be numbers, empty (means the cell is empty) , or has NA values. How I can specify the values that I want when I do the match ? – user8863554 Mar 07 '18 at 20:32
  • @user8863554 In that case, `filter` out the values that you don't want and then do the join i.e. `df2[-1] %>% filter(R != "") %>% semi_join(df1, by = c(R = "y"))` – akrun Mar 08 '18 at 01:18
1
 merge(df1,df2,by.x="y",by.y="R")[c("y","score")]
    y score
1   2     3
2 111     4
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • thanks, that worked but I have another question how do I store the UNmathced ones and get there scores ? (meaning the opposite of what we did ) – user8863554 Mar 07 '18 at 07:02