Dplyr: looking for a join with NAs on a panel dataset

Question

I have two tables which look like the following:

Table "data"                                 Table "countries"
------------------------------               -----------------
ID Time Country Variable Value               ID Country
1  2013 Austria V1       150                 1  Austria
1  2014 Austria V1       170                 2  Germany
1  2015 Austria V1       200                 3  Spain
2  2013 Germany V1       280
2  2015 Germany V1       350

The "data" table is a panel dataset which consits of multiple countries in multiple time periods and multiple variables (even though i just included one variable in the example). The "countries" table is a full table of all countries - here three.

What I need to do is a join which fills the missing observations in the "data" table with NAs for the value for each year! After the join the "data" table should look like this:

Table "data"                                 
------------------------------               
ID Time Country Variable Value               
1  2013 Austria V1       150                 
1  2014 Austria V1       170                 
1  2015 Austria V1       200                 
2  2013 Germany V1       280
2  2014 Germany V1       NA
2  2015 Germany V1       350
3  2013 Spain   V1       NA
3  2014 Spain   V1       NA
3  2015 Spain   V1       NA

I think the a right join is the right direction and it also works, but it includes the NAs just once. In the above example, Spain would only be included once in the "data" table. I did a simple join like this:

joined <- right_join(data, countries, by = c("ID"))

Is there something like a "repetitive join" for panel datasets or do I need to split the dataset into time periods, join them separately and append them afterwards?

Thank you!

It would be nicer if you shared your data in a [reproducible form](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so they can be copy/pasted into R for testing. — MrFlick, May 18 '16 at 14:24
I'm sorry - I was in a hurry during work as I wrote the question, but next time I am going to include a reproducible example. — Martin G., May 19 '16 at 09:18

erc · Accepted Answer · 2016-05-18T14:45:05.453

6

You can use complete from package tidyr:

library(tidyr)
right_join(data, countries) %>%
  complete(Time, Variable, nesting(Country, ID)) 

   Time Variable Country    ID Value
  (int)    (chr)   (chr) (int) (int)
1  2013       V1 Austria     1   150
2  2013       V1 Germany     2   280
3  2013       V1   Spain     3    NA
4  2014       V1 Austria     1   170
5  2014       V1 Germany     2    NA
6  2014       V1   Spain     3    NA
7  2015       V1 Austria     1   200
8  2015       V1 Germany     2   350
9  2015       V1   Spain     3    NA

edited May 18 '16 at 14:45

answered May 18 '16 at 14:25

erc

10,113
11
57
88

Thank you very much! That is exactly what I need and it works 100%! – Martin G. May 19 '16 at 09:19

Dplyr: looking for a join with NAs on a panel dataset

1 Answers1