I would appreciate your help with this. I am trying to merge two datasets. Dataset1 has 12 observations (one observation for every month of the year). Dataset2 has 14 observations. The key variable is Month. The 2 additional observations in Dataset2 correspond to 2 additional rows in June. So, in the merged dataset I get three rows for June. The problem is that the merged dataset is replicating the information in Dataset1 to the two additional rows. I need those two additional rows to be NAs (as shown in the picture). I have tried using the merge and join functions and both replicate the data. Could you please show me how to solve this?
Asked
Active
Viewed 101 times
-1
-
1It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Pictures of data are not helpful because we cannot copy/paste the data into R. Most merging questions options are addressed already here: https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right – MrFlick Aug 17 '20 at 18:56
2 Answers
0
What you're asking for is specifically and fundamentally opposite of what join/merge operations are supposed to to, so it won't happen within merge(...)
. However, you can make it happen after the fact.
dat1 <- data.frame(Key = c("January", "June"), Seller1 = c(500, 1000))
dat2 <- data.frame(Key = c("January", "June", "June", "June"), Seller2 = c(800, 2000, 1000, 2000))
merged <- merge(dat1, dat2, by = "Key", all = TRUE)
merged[,names(dat1)]
# Key Seller1
# 1 January 500
# 2 June 1000
# 3 June 1000
# 4 June 1000
duplicated(merged[,names(dat1)])
# [1] FALSE FALSE TRUE TRUE
merged[duplicated(merged[,names(dat1)]), "Seller1"] <- NA
merged
# Key Seller1 Seller2
# 1 January 500 800
# 2 June 1000 2000
# 3 June NA 1000
# 4 June NA 2000

r2evans
- 141,215
- 6
- 77
- 149
0
We can also use full_join
library(dplyr)
full_join(df1, df2, by = 'Key') %>%
mutate(Seller1 = replace(Seller1, duplciated(Seller1), NA))

akrun
- 874,273
- 37
- 540
- 662