1

I need to merge two different datasets. In one dataset, there is a variable called “Dyadpair” which contains a number that represents two different country codes. For example, the Dyadpair “2365” represents the United States (2) and Russia (365).

In my second dataset, there is not a “Dyadpair” variable…but I need to create one so that I can merge these two datasets.

In this second dataset, I have a variable called “stateA” (to represent the attacker state) and a variable called “stateB” (to represent the victim state). These states are in the same country code format of my first dataset (e.g. “2” to represent” the United States). However, I am working with 3858 observations, so there are a number of different country codes.

Considering “stateA” (attacker state), the country codes range from “2” to “940.” Considering “stateB” (victim state), the country codes range from “2” to “986.”

I need to combine “stateA” with “stateB” to get a new “Dyadpair” variable - in such a way where the smaller country code appears first (so it matches the first dataset). But I need this variable to stay in that dataset (not appear as a data frame), because I am not finished working with this dataset. (Next, I will need to aggregate the # of attacks per year, based on each “Dyadpair”).

Here is what my dataset looks like (well, it doesn't yet have the variable "dyadpair" - that is what I would like for it to look like):

       incidentnumber   stateA  stateB   year    actiondummy   dyadpair
1      3551005          211     345      1992    1             211345
2      3551002          20      200      1992    1             20200         
3      3551003          390     360      1992    1             360390 
4      3551004          220     2        1992    1             2220   
5      3551005          255     645      1992    1             255645
6      3551006          350     690      1992    1             350690
7161   4598003          770     2        1992    0             2770 
7163   4599001          700     630      1992    0             630700
7164   4599002          700     630      1992    1             630700

I would like to create a new variable called "dyadpair" - which combines "stateA" with "stateB" ... but it is very important that the smaller country code comes first.

user438383
  • 5,716
  • 8
  • 28
  • 43
newtoR
  • 33
  • 4

2 Answers2

2

Example data:

d <- data.frame("countrycode1" = 1:5,
                "countrycode2" = sample(1:5))

Solution (iterate through each row, sort the country codes, and paste them together):

d$newcodes <- apply(d, 1, function(x) paste(sort(x), collapse = ""))

Update to fit your specific data

df$dyadpair <- apply(df[, c("stateA", "stateB")], 1, function(x) paste0(sort(x), collapse = ""))
Brigadeiro
  • 2,649
  • 13
  • 30
  • Hi, @Brigadeiro - That was able to work as a separate data frame, but I can't figure out how to add this new “dyadpair” a variable to my existing data. I tried this: newdata$dyadpair <- data.frame("ccode" = 2:986, "stateB" = sample(2:986)) But this error popped up: Error in `$<-.data.frame`(`*tmp*`, dyadpair, value = list(ccode = 2:986, : replacement has 985 rows, data has 3858 Any suggestions? The reason I need a dead-pair variable is so that it matches perfectly with another dataset - so I can merge them. – newtoR Sep 07 '19 at 06:41
  • Update your question with example data that represents your data and I will try to help – Brigadeiro Sep 07 '19 at 14:30
  • @newtoR, please add code constructing a `data.frame` that is representative of the data you are working with. See the example in my answer. Also please add an example of the output you are looking for. – Brigadeiro Sep 07 '19 at 17:47
  • I'm sorry, I'm not quite sure how to do that. – newtoR Sep 07 '19 at 19:09
  • @newtoR, see this post for help: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Brigadeiro Sep 07 '19 at 19:26
  • Ok, I think I did what you were asking for. Does that help? – newtoR Sep 08 '19 at 00:04
  • @newtoR, i just updated my post for you. `df` is the name I called the `data.frame`, so you will have to change that to match what you are calling it in your script. – Brigadeiro Sep 08 '19 at 00:17
  • Great! You can accept my answer so nobody else spends time on it! – Brigadeiro Sep 08 '19 at 01:55
0

One way to create either directed or undirected dyads would be via merges:

library(data.table)

d <- data.table("countrycode1" = LETTERS[1:5])
# create a "count" variable
d[, n:= 1:nrow(d)]
# create a help variable to merge on 
d[, merge_var := "X"]

head(d)
# countrycode1 n merge_var
# 1:            A 1         X
# 2:            B 2         X
# 3:            C 3         X
# 4:            D 4         X
# 5:            E 5         X

all_merged <- merge(d, d, by = "merge_var", allow.cartesian = TRUE)
head(all_merged)
# merge_var countrycode1.x n.x countrycode1.y n.y
# 1:         X              A   1              A   1
# 2:         X              A   1              B   2
# 3:         X              A   1              C   3
# 4:         X              A   1              D   4
# 5:         X              A   1              E   5
# 6:         X              B   2              A   1

# delete help variable
all_merged[, merge_var := NULL]

# delete "equal" dyads (e.g. US-US) -> results in symmetric dyadic data set
all_merged_sym <- all_merged[n.x != n.y]
head(all_merged_sym)

# countrycode1.x n.x countrycode1.y n.y
# 1:              A   1              B   2
# 2:              A   1              C   3
# 3:              A   1              D   4
# 4:              A   1              E   5
# 5:              B   2              A   1
# 6:              B   2              C   3

# for asymmetric dyadic data: 
all_merged_asym <- all_merged[n.x < n.y]
head(all_merged_asym)
# countrycode1.x n.x countrycode1.y n.y
# 1:              A   1              B   2
# 2:              A   1              C   3
# 3:              A   1              D   4
# 4:              A   1              E   5
# 5:              B   2              C   3
# 6:              B   2              D   4

This also works when you wanted to create dyads only within specific groups - e.g. create dyads only for countries on the same continents:



d <- data.table("countrycode1" = LETTERS[1:5])
# create a help variable to merge on 
d[, merge_var := c(rep("X1", 3), rep("X2", 2))]
# sort the data set before assigning count variable
setkey(d, "merge_var", "countrycode1")
d[, n:= 1:nrow(d)]

head(d)
# countrycode1 merge_var n
# 1:            A        X1 1
# 2:            B        X1 2
# 3:            C        X1 3
# 4:            D        X2 4
# 5:            E        X2 5

group_merged <- 
  merge(d, d, by = "merge_var", allow.cartesian = TRUE)[n.x != n.y]
group_merged
# merge_var countrycode1.x n.x countrycode1.y n.y
# 1:        X1              A   1              B   2
# 2:        X1              A   1              C   3
# 3:        X1              B   2              A   1
# 4:        X1              B   2              C   3
# 5:        X1              C   3              A   1
# 6:        X1              C   3              B   2
# 7:        X2              D   4              E   5
# 8:        X2              E   5              D   4
A.Fischer
  • 596
  • 5
  • 11