0

I have two datasets that look like this that I am having difficulty with merging.

I've already tried:

ndf <- merge(df1, df2, by=c("state", "year")) 

but it ended up with a data frame with 200,000 observations. Here are my two example data sets, df1 is empty in the "income" and "local_income" column:

df1                                  df2
state year  income  local_income     state year  income local_income
CA    1992                           CA    1992   1       1 
CA    1993                           NV    1992   4       3
CA    1994                           CO    1992   3       2
CA    1995                           
CA    1996 
NV    1992 
NV    1993 
NV    1994 
NV    1995 
NV    1996 
CO    1992 
CO    1993
CO    1994
CO    1995
CO    1996


Essentially what I want to do is merge these two datasets to look like this:

df3
state year  income  local_income     
CA    1992  1         1              
CA    1993                           
CA    1994                          
CA    1995                           
CA    1996 
NV    1992  4         3
NV    1993 
NV    1994 
NV    1995 
NV    1996 
CO    1992  3         2
CO    1993
CO    1994
CO    1995
CO    1996

And then I'll eventually go on to merging for each year. But this is a good start to get me going. Any help will be greatly appreciated! This would other wise take me 8+ hours to do with all the data I have, so I'm excited to see the power of r and its community!

Emma
  • 31
  • 1
  • 4
  • `merge(df1, df2, by=c("state", "year"), all.x = TRUE)` ? – Ronak Shah Aug 12 '19 at 11:15
  • `merge(df1[,1:2], df2, by=c("state", "year"), all.x = TRUE)` – Roland Aug 12 '19 at 11:18
  • Just tried it, it produces a dataset with 3,000,000 observations. – Emma Aug 12 '19 at 11:18
  • 2
    @Emma That can only happen if state and year do not identify unique observations in these data.frames. It appears you have multiple entries with the same states and years. This results in a kind of *cartesian join*. – Roland Aug 12 '19 at 11:24

1 Answers1

0

You can also try the dplyr version.

library(dplyr)
df3 <- full_join(df1, df2, by=c("state", "year"))