0

This question was tagged as a duplicate, but I would like to give a nod to the answer to my question. The solutions provided in the other were not sufficient to my purposes (I tried them out to little success) and they also relied on packages that seemed nuanced and unnecessary. I appreciate the supplied answer to my question by user Stewart Ross because they were able to answer the question with a very simple solution using a package (dplyr) which is widely used and very familiar

Similar questions have been asked, and this seems pretty simple, but all the answers I'm finding have been yielding what I already have. I don't seem to be able to retain all the unique row values in both data frames. In my situation, I have two data frames that I'm merging by a key column "Code". However, df1 is from a previous year and so it has "Codes" that old codes that aren't found in df2. Likewise df2 has new codes that aren't found in df1.

So let's suppose they look like this:

df1
Code   Value1
A      10
B      20
C      50

df2
Code   Value2
A      30
D      40
E      60

With my current code I'm utilizing df3 <- left_join(df2, df1, by="Code")

So what I get is:

df3
Code   Value1   Value2
A      10       30
D      n.a.     40
E      n.a.     60

Which would be alright if I were only concerned with representing this year's "Codes". My desired output retains the old codes from the prior year. I'd like df3 to resemble the following:

df3
Code  Value1  Value2
A    10      30
B    20      n.a.
C    50      n.a.
D    n.a.    40
E    n.a.    60

So my issue is that I have two data frames with different Codes, and I want to merge them together while preserving all the codes. I haven't been able to accomplish this my using left_join, rbind, or subsetting.

Any help would be greatly appreciated!

1 Answers1

0

The full_join function in dplyr accomplishes this:

library(dplyr)
dfa = data.frame(code = c("A", "B" , "C"), value1 = c(10, 20, 50), stringsAsFactors = FALSE)
dfb = data.frame(code = c("A", "D" , "E"), value2 = c(30, 40, 60), stringsAsFactors = FALSE)
full_join(dfa, dfb)

Joining, by = "code"
  code value1 value2
1    A     10     30
2    B     20     NA
3    C     50     NA
4    D     NA     40
5    E     NA     60
Stewart Ross
  • 1,034
  • 1
  • 8
  • 10
  • Thank you, I appreciate this solution immensely as it uses dplyr and not some unfamiliar package that I have no experience in using. This is definitely the best solution I've found online so far. – R. Kersting Jan 28 '18 at 22:35