-1

I would appreciate your help with this. I am trying to merge two datasets. Dataset1 has 12 observations (one observation for every month of the year). Dataset2 has 14 observations. The key variable is Month. The 2 additional observations in Dataset2 correspond to 2 additional rows in June. So, in the merged dataset I get three rows for June. The problem is that the merged dataset is replicating the information in Dataset1 to the two additional rows. I need those two additional rows to be NAs (as shown in the picture). I have tried using the merge and join functions and both replicate the data. Could you please show me how to solve this?

This is how the merge data should look

Andres H
  • 15
  • 6
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Pictures of data are not helpful because we cannot copy/paste the data into R. Most merging questions options are addressed already here: https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right – MrFlick Aug 17 '20 at 18:56

2 Answers2

0

What you're asking for is specifically and fundamentally opposite of what join/merge operations are supposed to to, so it won't happen within merge(...). However, you can make it happen after the fact.

dat1 <- data.frame(Key = c("January", "June"), Seller1 = c(500, 1000))
dat2 <- data.frame(Key = c("January", "June", "June", "June"), Seller2 = c(800, 2000, 1000, 2000))

merged <- merge(dat1, dat2, by = "Key", all = TRUE)
merged[,names(dat1)]
#       Key Seller1
# 1 January     500
# 2    June    1000
# 3    June    1000
# 4    June    1000
duplicated(merged[,names(dat1)])
# [1] FALSE FALSE  TRUE  TRUE
merged[duplicated(merged[,names(dat1)]), "Seller1"] <- NA
merged
#       Key Seller1 Seller2
# 1 January     500     800
# 2    June    1000    2000
# 3    June      NA    1000
# 4    June      NA    2000
r2evans
  • 141,215
  • 6
  • 77
  • 149
0

We can also use full_join

library(dplyr)
full_join(df1, df2, by = 'Key') %>%
          mutate(Seller1 = replace(Seller1, duplciated(Seller1), NA))
akrun
  • 874,273
  • 37
  • 540
  • 662