Merge datasets in R

Question

I would appreciate your help with this. I am trying to merge two datasets. Dataset1 has 12 observations (one observation for every month of the year). Dataset2 has 14 observations. The key variable is Month. The 2 additional observations in Dataset2 correspond to 2 additional rows in June. So, in the merged dataset I get three rows for June. The problem is that the merged dataset is replicating the information in Dataset1 to the two additional rows. I need those two additional rows to be NAs (as shown in the picture). I have tried using the merge and join functions and both replicate the data. Could you please show me how to solve this?

This is how the merge data should look

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Pictures of data are not helpful because we cannot copy/paste the data into R. Most merging questions options are addressed already here: https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right — MrFlick, Aug 17 '20 at 18:56

score 0 · Answer 1 · answered Aug 17 '20 at 19:00

What you're asking for is specifically and fundamentally opposite of what join/merge operations are supposed to to, so it won't happen within merge(...). However, you can make it happen after the fact.

dat1 <- data.frame(Key = c("January", "June"), Seller1 = c(500, 1000))
dat2 <- data.frame(Key = c("January", "June", "June", "June"), Seller2 = c(800, 2000, 1000, 2000))

merged <- merge(dat1, dat2, by = "Key", all = TRUE)
merged[,names(dat1)]
#       Key Seller1
# 1 January     500
# 2    June    1000
# 3    June    1000
# 4    June    1000
duplicated(merged[,names(dat1)])
# [1] FALSE FALSE  TRUE  TRUE
merged[duplicated(merged[,names(dat1)]), "Seller1"] <- NA
merged
#       Key Seller1 Seller2
# 1 January     500     800
# 2    June    1000    2000
# 3    June      NA    1000
# 4    June      NA    2000

score 0 · Answer 2 · answered Aug 17 '20 at 19:05

0

We can also use full_join

library(dplyr)
full_join(df1, df2, by = 'Key') %>%
          mutate(Seller1 = replace(Seller1, duplciated(Seller1), NA))

answered Aug 17 '20 at 19:05

akrun

874,273
37
540
662

Merge datasets in R

2 Answers2