I'm using R plyr
's join()
function to join two dataframes by columns, "country" and "year," over certain tax treaties. However, the "country" and "year" columns of the two dataframes carry different length with different number of unique values, something like the following.
df1
=
country year treaty_id
A 1995 1
B 1995 1
C 1995 1
D 1995 1
A 2001 2
B 2001 2
D 2001 2
C 2015 3
D 2015 3
while df2
has the complete lists of "country" and "year" values
country year
A 1992
B 1992
C 1992
D 1992
E 1992
F 1992
G 1992
H 1992
A 1993
B 1993
C 1993
… …
G 2015
H 2015
for "country" A to H and "year" 1992 to 2015.
Is there anyway to match df1
and df2
by the "year" information given in df1
(when a particular tax treaty was signed) and expand the list of country from A to H grouped by treaty_id
. (Since there may be more than one treaties signed in the same year, so it needs to be grouped by treaty_id
)
Such that I can generate another dummy variable member
to indicate each country's membership in a tax treaty. The output should look something like this
country year treaty_id member
A 1995 1 1
B 1995 1 1
C 1995 1 1
D 1995 1 1
E 1995 1 0
F 1995 1 0
G 1995 1 0
H 1995 1 0
A 2001 2 1
B 2001 2 1
C 2001 2 0
D 2001 2 1
E 2001 2 0
F 2001 2 0
G 2001 2 0
H 2001 2 0
… … …
H 2015 n 1
It will really appreciated if anyone could enlighten me on this.