1

I have a directed affiliation matrix which I want to convert to an edge list. The matrix looks like this:

State   WarID    Initiator
A        1       1
B        1       0
A        2       1
C        2       0
D        2       0
B        3       1
C        3       1
D        3       0

where "State" is the name of a country, "WarID" is a unique identifier for war, and "Initiator" is a dummy variable which equals 1 if the state initiated the war. There is an edge between two states if they share the same "WarID" but have different value of "Initiator."

I want to change the affiliation matrix above into an edge list like this:

Initiator   Target  WarID
A              B    1
A              C    2
A              D    2
B              D    3
C              D    3

I know how to change a basic affiliation matrix into an edge list, but I struggled with keeping the "directed network" component. I'll be very grateful if someone could tell me how to do this in R efficiently (I have a pretty large affiliation matrix).

jay.sf
  • 60,139
  • 8
  • 53
  • 110
Erebonia
  • 13
  • 2

3 Answers3

1

Using tidyverse you could do:

library(tidyverse)
df %>%
   group_by(WarID) %>%
   summarise(Target = list(State[Initiator==0]),
             Initiator = list(State[Initiator==1]), .groups='drop') %>% 
   unnest(c(Initiator, Target)) %>%
   rev() # Just to reverse the ordering, otherwise not necessary 

    # A tibble: 5 x 3
  Initiator Target WarID
  <chr>     <chr>  <int>
1 A         B          1
2 A         C          2
3 A         D          2
4 B         D          3
5 C         D          3
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • Onyambu, if I switch Target and Initiator in the summarise function, i.e. if I first give "Initiator =" and then follow it with "Target = " it's throwing an error: "x (list) object cannot be coerced to type 'double'", could you please explain why the order matters here. – Karthik S Oct 22 '20 at 07:16
  • 1
    @KarthikS because once you start with Initiator, then you will have changed the vector initiator and it will not be the 0,1 vector anymore. To start with initiator, use a different naming, eg init= .. or any other thing – Onyambu Oct 22 '20 at 13:06
0

Does this work:

> library(dplyr)
> df %>% group_by(WarID) %>% filter(Initiator == 1) %>% 
+   inner_join(df %>% group_by(WarID) %>% filter(Initiator == 0), by = ('WarID')) %>% rename(Target = State.y, Initiator = State.x ) %>% 
+   select(1,4,2)
# A tibble: 5 x 3
# Groups:   WarID [3]
  Initiator Target WarID
  <chr>     <chr>  <dbl>
1 A         B          1
2 A         C          2
3 A         D          2
4 B         D          3
5 C         D          3
> 

Data used:

> dput(df)
structure(list(State = c("A", "B", "A", "C", "D", "B", "C", "D"
), WarID = c(1, 1, 2, 2, 2, 3, 3, 3), Initiator = c(1, 0, 1, 
0, 0, 1, 1, 0)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -8L), spec = structure(list(cols = list(
    State = structure(list(), class = c("collector_character", 
    "collector")), WarID = structure(list(), class = c("collector_double", 
    "collector")), Initiator = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))
> 
Karthik S
  • 11,348
  • 2
  • 11
  • 25
0

You could group the data by WarID and Initiator using tapply and make an expand.grid for each WarID. Just rbind the results.

FUN <- function(d) {
  r <- with(d, tapply(State, list(WarID, Initiator), I))
  r <- lapply(1:nrow(r), function(i) cbind(expand.grid(rev(r[i, ])), i))
  r <- setNames(do.call(rbind, r), c("Initiator", "Target", "WarID"))
  r
}
FUN(d)
#   Initiator Target WarID
# 1         A      B     1
# 2         A      C     2
# 3         A      D     2
# 4         B      D     3
# 5         C      D     3

Notice that I used consecutive WarIDs as specified by you.


Data:

d <- structure(list(State = c("A", "B", "A", "C", "D", "B", "C", "D"
), WarID = c(1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), Initiator = c(1L, 
0L, 1L, 0L, 0L, 1L, 1L, 0L)), class = "data.frame", row.names = c(NA, 
-8L))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • When I run your code, it works for the example data frame but not my real data. It returns: Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1 WarID ranges from 1 to 510 in my real data and there's no gaps and its frequency ranges from 2 to 15. – Erebonia Oct 22 '20 at 16:40
  • @Erebonia You have just learned how important a reproducible example/data is. You may want to study our guidelines: https://stackoverflow.com/a/5963610/6574038 – jay.sf Oct 22 '20 at 16:44
  • 1
    Thanks! I'll include a reproducible data next time I ask a question. And I just figured it out: there was a coding error in my real data, and once I fixed that, your code works perfectly. – Erebonia Oct 22 '20 at 17:16
  • @Erebonia That's cool and I'm glad to have helped you. You may consider now to [accept an aswer](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work/5235#5235) to mark the question _done_. – jay.sf Oct 22 '20 at 17:18