0

I have two dataframes df1 and df2, I am looking for the simplest operation to get df3.

I want to replace rows in df1 with rows from df2 if id match (so rbind.fill is not a solution), and append rows from df2 where id does not exist in df1but only for columns that exist in df2.

I guess I could use several joins and antijoins and then merge but I wonder if there already exists a function for that operation.

df1 <- data.frame(id = 1:5, c1 = 11:15, c2 = 16:20, c3 = 21:25)
df2 <- data.frame(id = 4:7, c1 = 1:4, c2 = 5:8)

df1
  id c1 c2 c3
  1 11 16 21
  2 12 17 22
  3 13 18 23
  4 14 19 24
  5 15 20 25

df2
  id c1 c2
  4  1  5
  5  2  6
  6  3  7
  7  4  8

df3
  id c1 c2 c3
  1  11 16 21
  2  12 17 22
  3  13 18 23
  4  1  5  24
  5  2  6  25
  6  3  7  NULL
  7  4  8  NULL
vwrobel
  • 1,706
  • 15
  • 25
  • This has been asked before and the answer is a variant of rbind.list, namely the rbind.fill function in plyr. – IRTFM Nov 14 '18 at 18:56
  • I dont think that this function helps me since I need some replacement of values based on id. – vwrobel Nov 14 '18 at 19:06
  • 1
    OK. I think the answer to "is there a function that does all of that in one go" is "no". If you know how to doi it with SQL then you might look at the `sqldf` package. I'm also not sure your verbal description completely covers the desired results. – IRTFM Nov 14 '18 at 19:31

2 Answers2

0

I ended up with :

special_combine <- function(df1, df2){
  df1_int <- df1[, colnames(df1) %in% colnames(df2)]
  df1_ext <- df1[, c("id", colnames(df1)[!colnames(df1) %in% colnames(df2)])]
  df3 <- bind_rows(df1_int, df2)
  df3 <- df3[!duplicated(df3$id, fromLast=TRUE), ] %>% 
    dplyr::left_join(df1_ext, by="id") %>% 
    dplyr::arrange(id)
  df3
}
vwrobel
  • 1,706
  • 15
  • 25
0

We can use {powerjoin}, make a full join and deal with the conflicts using coalesce_xy (which is really dplyr::coalesce) :

library(powerjoin)
df1 <- data.frame(id = 1:5, c1 = 11:15, c2 = 16:20, c3 = 21:25)
df2 <- data.frame(id = 4:7, c1 = 1:4, c2 = 5:8)
safe_full_join(df1, df2, by= "id", conflict = coalesce_xy)
#   id c1 c2 c3
# 1  1 11 16 21
# 2  2 12 17 22
# 3  3 13 18 23
# 4  4 14 19 24
# 5  5 15 20 25
# 6  6  3  7 NA
# 7  7  4  8 NA
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167