0

I have two data.frames like this:

df1:

 col1 col2 col3
 1     2     20
 2     4     30
 3     4     25

df2:

 colX colY colZ
  5     12     
  1     9      
  1     2     

Now I want to assign values to colZ in df2. This has to be the number in col3 of df1 when the values in col1 equals colX and col2 equals colY

Desired output:
df2:

 colX colY colZ
  5     12  NA  
  1     9   NA  
  1     2   20

I hope that someone can help me

jranzijn
  • 31
  • 7

1 Answers1

0

Using the merge() function with a sufficient number of bells and whistles will get you the result you want. The trick here is using by.x=c(...) and by.y=c(...) to specify multiple columns on which to join during the merge. There is also some massaging to get the column names as you want in your desired output.

result <- merge(df1, df2, by.x=c("col1", "col2"), by.y=c("colX", "colY"), all.y=TRUE)[, 1:3]
names(result) <- c("colX", "colY", "colZ")

> result
  colX colY colZ
1    1    2   20
2    1    9   NA
3    5   12   NA
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • should it not be merge(df,df2 ,by.x = c("col1","col2"), by.y = c("colX","colY"), all.y = TRUE)? – Marsenau May 17 '16 at 14:05
  • Please leave a comment, thanks. – Tim Biegeleisen May 17 '16 at 14:09
  • Hi Tim, thanks for the answer. Something is going wrong here, It removes some rows while it create duplicates of many others. – jranzijn May 17 '16 at 14:14
  • I updated my answer a couple of minutes ago. Try it again. – Tim Biegeleisen May 17 '16 at 14:15
  • Thanks again. It still create many more rows. I started with 100.000, now i got almost 230.000 of them. I have to check where the duplicates come from. – jranzijn May 17 '16 at 14:22
  • My guess is that you intended to do an outer join and instead my answer is giving you an inner join. Put another way, your example problem is not representative of what you are really trying to do. – Tim Biegeleisen May 17 '16 at 14:23
  • Ok, thanks for the info. I do have many more columns in Df2 but that should not make i difference right? Can't see any other differences. – jranzijn May 17 '16 at 14:31
  • Your example does not represent your real data set. If there be multiple matches this could lead to "extra" rows. My code may be correct and your problem statement could be incomplete. – Tim Biegeleisen May 17 '16 at 14:33