0

I have many datasets with two variables-essentially the Year and the value. The dataset goes from 1960-2019, however some variables skip a year or two, or are only from 1990-2019, etc. This is an example:

---LND.KM---              ---LND.PERCENT---
Year  lnd.km              Year    lnd.prcnt
2018  315000              2018     13.864
2017  315000              2015     14.203
2016  320000              2014     14.159
2015  322000              2012     13.850
2014  321000        

I want to add all of the datasets in to one large dataset, with a Year column spanning from 1960-2019- -and bind each value column to it's actual year it is connected to. Like this:

---------BIG.DF------------
Year   lnd.km      lnd.prcnt
2019     
2018    315000      13.864  
2017    315000      
2016    320000
2015    322000      14.203
2014    321000      14.159
2013
2012                13.850

How would I achieve this?

Thanks.

r2evans
  • 141,215
  • 6
  • 77
  • 149

2 Answers2

0

You can use

full_join(one_df, the_other_df, by=c("Year"="Year"))
intedgar
  • 631
  • 1
  • 11
0

This is just a join/merge operation.

out <- merge(df1, df2, by = "Year", all = TRUE)
out
#   Year lnd.km lnd.prcnt
# 1 2012     NA    13.850
# 2 2014 321000    14.159
# 3 2015 322000    14.203
# 4 2016 320000        NA
# 5 2017 315000        NA
# 6 2018 315000    13.864

The only thing different is that you are asking R to infer that 2019 should be in the data. For that, extend the above with

rbind(out, transform(out[NA,][seq_along(needyears),], Year = needyears), make.row.names = FALSE)
#   Year lnd.km lnd.prcnt
# 1 2012     NA    13.850
# 2 2014 321000    14.159
# 3 2015 322000    14.203
# 4 2016 320000        NA
# 5 2017 315000        NA
# 6 2018 315000    13.864
# 7 2013     NA        NA
# 8 2019     NA        NA

and order as desired.


Data

df1 <- structure(list(Year = 2018:2014, lnd.km = c(315000L, 315000L, 320000L, 322000L, 321000L)), class = "data.frame", row.names = c(NA, -5L))
df2 <- structure(list(Year = c(2018L, 2015L, 2014L, 2012L), lnd.prcnt = c(13.864, 14.203, 14.159, 13.85)), class = "data.frame", row.names = c(NA, -4L))
r2evans
  • 141,215
  • 6
  • 77
  • 149