I've come across something a bit above my skill set. I'm working with IMF trade data that consists of data between country dyads. The IMF dataset consists of ' unordered duplicate' records in that each country individually reports trade data. However, due to a variety of timing, recording systems, regime type, etc., there are discrepancies between corresponding values. I'm trying to manipulate this data in two ways:
- Assign the mean values to the duplicated dyads.
- Assign the dyad values conditionally based on a separate economic indicator or development index (who do I trust more?).
There are several discussions of identifying unordered duplicates here, here, here, and here but after a couple days of searching I have yet to see what I'm trying to do.
Here is an example of the raw data. In reality there are many more variables and several hundred thousand dyads:
reporter<-c('USA','GER','AFG','FRA','CHN')
partner<-c('AFG','CHN','USA','CAN','GER')
year<-c(2010,2010,2010,2009,2010)
import<-c(-1000,-2000,-2400,-1200,-2000)
export<-c(2500,2200,1200,2900,2100)
rep_econ1<-c(28,32,12,25,19)
imf<-data.table(reporter,partner,year,import,export,rep_econ1)
imf
reporter partner year import export rep_econ1
1: USA AFG 2010 -1000 2500 28
2: GER CHN 2010 -2000 2200 32
3: AFG USA 2010 -2400 1200 12
4: FRA CAN 2009 -1200 2900 25
5: CHN GER 2010 -2000 2100 19
The additional wrinkle is that import
and export
are inverses of each other between the dyads, so they need to be matched and meaned with an absolute value.
For objective 1, the resulting data.table
is:
Mean
reporter partner year import export rep_econ1
USA AFG 2010 -1100 2450 28
GER CHN 2010 -2050 2100 32
AFG USA 2010 -2450 1100 12
FRA CAN 2009 -1200 2900 25
CHN GER 2010 -2100 2050 19
For objective 2:
Conditionally Assign on Higher Economic Indicator (rep_econ1
)
reporter partner year import export rep_econ1
USA AFG 2010 -1000 2500 28
GER CHN 2010 -2000 2200 32
AFG USA 2010 -2500 1000 12
FRA CAN 2009 -1200 2900 25
CHN GER 2010 -2200 2000 19
It's possible not all dyads are represented twice so I included a solo record. I prefer data.table
but I'll go with anything that leads me down the right path.
Thank you for your time.