4

I have data organized by country-year, with a ID for a dyadic relationship. I want to organize this by dyad-year.

Here is how my data is organized:

     dyadic_id country_codes year
  1          1           200 1990
  2          1            20 1990
  3          1           200 1991
  4          1            20 1991
  5          2           300 1990
  6          2            10 1990
  7          3           100 1990
  8          3            10 1990
  9          4           500 1991
  10         4           200 1991

Here's how I want my data to be organized:

  dyadic_id_want country_codes_1 country_codes_2 year_want
1              1             200              20      1990
2              1             200              20      1991
3              2             300              10      1990
4              3             100              10      1990
5              4             500             200      1991

Here is reproducible code:

dyadic_id<-c(1,1,1,1,2,2,3,3,4,4)
country_codes<-c(200,20,200,20,300,10,100,10,500,200)
year<-c(1990,1990,1991,1991,1990,1990,1990,1990,1991,1991)
mydf<-as.data.frame(cbind(dyadic_id,country_codes,year))

I want mydf to look like df_i_want

dyadic_id_want<-c(1,1,2,3,4)
country_codes_1<-c(200,200,300,100,500)
country_codes_2<-c(20,20,10,10,200)
year_want<-c(1990,1991,1990,1990,1991)
my_df_i_want<-as.data.frame(cbind(dyadic_id_want,country_codes_1,country_codes_2,year_want))
JasonAizkalns
  • 20,243
  • 8
  • 57
  • 116
user46257
  • 129
  • 1
  • 8

1 Answers1

6

We can reshape from 'long' to 'wide' using different methods. Two are described below.

Using 'data.table', we convert the 'data.frame', to 'data.table' (setDT(mydf)), create a sequence column ('ind'), grouped by 'dyadic_id' and 'year'. Then, we convert the dataset from 'long' to 'wide' format using dcast.

library(data.table)
setDT(mydf)[, ind:= 1:.N, by = .(dyadic_id, year)]
dcast(mydf, dyadic_id+year~ paste('country_codes', ind, sep='_'), value.var='country_codes')
#   dyadic_id year country_codes_1 country_codes_2
#1:         1 1990             200              20
#2:         1 1991             200              20
#3:         2 1990             300              10
#4:         3 1990             100              10
#5:         4 1991             500             200

Or using dplyr/tidyr, we do the same i.e. grouping by 'dyadic_id', 'year', create a 'ind' column (mutate(...), and use spread from tidyr to reshape to 'wide' format.

library(dplyr)
library(tidyr)
mydf %>% 
    group_by(dyadic_id, year) %>%
    mutate(ind= paste0('country_codes', row_number())) %>% 
    spread(ind, country_codes)
#    dyadic_id  year country_codes1 country_codes2
#       (dbl) (dbl)          (dbl)          (dbl)
#1         1  1990            200             20
#2         1  1991            200             20
#3         2  1990            300             10
#4         3  1990            100             10
#5         4  1991            500            200
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks! This works for my simulated data, but my actual data is different in a way I didn't realize: the dyadic-ids are actually event-ids. Most of them are between two countries, but some of them are between 10 or 20 countries! So this works for when there are two, but using the dplyr I get a data.frame with 41 columns that accounts for the few events that have a lot of participants. Any advice? – user46257 Nov 02 '15 at 16:51
  • 1
    @user46257 I guess it is better to post an example that mimics the original dataset. If it is very different, then post as a new question. – akrun Nov 02 '15 at 16:53