0

My problem is similar to this question: Conditional merge/replacement in R

But I would like to merge by multiple colums, in my example by x1 and x2 (in real life with more than 2 columns).

Of note, those conditions result in unique combinations (unique rows)!

df1 <- data.frame(x1 = c(1,1,2,2), x2 = rep(letters[1:2],2), x3 = letters[1:4], stringsAsFactors = FALSE )
df2 <- data.frame(x1 = c(1,2,3), x2 = letters[1:3], x3 = c('xx','zz','yy'), stringsAsFactors = FALSE )

df1
   x1 x2 x3
1  1  a  a
2  1  b  b
3  2  a  c
4  2  b  d

df2
  x1 x2 x3
1  1  a xx
2  2  b zz
3  3  c yy

Desired output

   x1 x2 x3  # 'merge/replaced' df1$x3 with df2$x3, by x1 & x2 
1  1  a xx
2  1  b  b
3  2  a  c
4  2  b zz

The following works in general and gives the desired output for the example:

require(dplyr)
anti1 <- anti_join(df1, df2, by = c('x1','x2'))
semi2 <- semi_join(df2, df1, by = c('x1','x2'))
full_join(semi2,anti1)

x1 x2 x3
1  1  a xx
2  2  b zz
3  1  b  b
4  2  a  c    # the row order is not important for me

But this doesn't seem really smart to me and also I do not know how this will behave when I have data frames with columns that are not part of the conditions.

I am sorry if this one should be a dup. If so, please hint me towards it. If you should have a better suggestion than my join approach: Thanks! I am open for many packages, but I am not really a data table guy...

tjebo
  • 21,977
  • 7
  • 58
  • 94

1 Answers1

1

I'm not sure if you would consider this 'smarter', but here is a way to do it with just one join call:

library(dplyr)

left_join(df1, df2, by = c('x1', 'x2')) %>% 
    mutate(x3 = if_else(is.na(x3.y), x3.x, x3.y)) %>% 
    select(-x3.y, -x3.x)

  x1 x2 x3
1  1  a xx
2  1  b  b
3  2  a  c
4  2  b zz
C. Braun
  • 5,061
  • 19
  • 47
  • Thanks. I was actually wondering if mutate might be an approach. Am currently on the way home and can’t run the code but it looks already quite good to me. – tjebo Mar 26 '18 at 14:46
  • As already assumed, it works indeed nicely. Good thing with mutate. Very flexible also which column of the other data frame to use for the replacement values. Thanks. However, this still seems like a rather 'quick and dirty' approach to me (which is fine for me). Maybe someone will have another suggestion? Although - probably no one will look at the question once I accept this answer :D – tjebo Mar 26 '18 at 23:31