0

I am trying to do network analysis in igraph but having some issues with transforming the dataset I have into an edge list (with weights), given the differing amount of columns.

The data set looks as follows (df1) (much larger of course): First is the main operator id (main operator can also be partner and vice versa, so the Ids are staying the same in the edge list) The challenge is that the amount of partners varies (from 0 to 40) and every interaction has to be considered (not just "IdMain to IdPartnerX").

IdMain IdPartner1  IdPartner2  IdPartner3 IdPartner4 .....
1      4           3           7          6
2      3           1          NA          NA
3      1           4           2          NA
4      9           6           3          NA
.
.

I already got the helpful tip to use reshape to do this, like:

data_melt <- reshape2::melt(data, id.vars = "IdMain")
edgelist <- data_melt[!is.na(data_melt$value), c("IdMain", "value")]

However, this only creates a 'directed' edgelist (from Main to Partners). What I need is something like below, where every interaction is recorded.

Id1 Id2 
1   4    
1   3    
1   7    
1   6        
4   3
4   7
4   6
3   7
etc

Does anyone have a tip what the best way to go is? I also looked into the igraph library and couldn't find the function to do this.

lmo
  • 37,904
  • 9
  • 56
  • 69
julia_3010
  • 255
  • 1
  • 2
  • 11
  • 2
    It's easier to help you if you provide a proper [reproduicble example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Give some sample input and provide the desired output for that sample input. Ideally in a form we can copy/paste into R to test possible solutions. – MrFlick Aug 17 '17 at 18:35

2 Answers2

2

There is no need for reshape(2) and melting etc. You just need to grap every combination of column pairs and then bind them together.

x <- read.table(text="IdMain IdPartner1  IdPartner2  IdPartner3 IdPartner4
1      4           3           7          6
2      3           1          NA          NA
3      1           4           2          NA
4      9           6           3          NA", header=TRUE)

idx <- t(combn(seq_along(x), 2))
edgelist <- lapply(1:nrow(idx), function(i) x[, c(idx[i, 1], idx[i, 2])])
edgelist <- lapply(edgelist, setNames, c("ID1","ID2"))
edgelist <- do.call(rbind, edgelist)
edgelist <- edgelist[rowSums(is.na(edgelist))==0, ]
edgelist
#    ID1 ID2
# 1    1   4
# 2    2   3
# 3    3   1
# 4    4   9
# 5    1   3
# 6    2   1
# 7    3   4
# 8    4   6
# 9    1   7
# 11   3   2
# 12   4   3
# 13   1   6
# 17   4   3
# 18   3   1
# 19   1   4
# 20   9   6
# 21   4   7
# 23   1   2
# 24   9   3
# 25   4   6
# 29   3   7 <--
# 31   4   2
# 32   6   3
# 33   3   6 <--
# 37   7   6 <--
emilliman5
  • 5,816
  • 3
  • 27
  • 37
  • thanks @emilliman5! your solution works, however, the created edgelist only takes the `IdMain` and creates the connections they have with the `IdPartnerX`. What I ideally would like to achieve is all the connections recorded. E.g, for row 1, the result should also contain `3,7; 3,6, 7,6` etc. – julia_3010 Aug 17 '17 at 19:31
  • My result contains all pairs for each row, minus those with an NA. – emilliman5 Aug 17 '17 at 19:37
  • ah my bad, i run it again and it worked perfectly, thank you! – julia_3010 Aug 17 '17 at 19:43
1

Using the data below. You can achieve what looks to be your goal with apply and combn. This returns a list matrices with the pairwise comparison of the row element of your data.frame

 myPairs <- apply(t(dat), 2, function(x) t(combn(x[!is.na(x)], 2)))

Note that the output of apply can be finicky and it is necessary here to have at least one row with an NA so that apply will return a list rather than a matrix.

If you want a data.frame at the end, use do.call and rbind to put the matrices together and then data.frame and setNames for the object coercion and to add names.

setNames(data.frame(do.call(rbind, myPairs)), c("Id1", "Id2"))
   Id1 Id2
1    1   4
2    1   3
3    1   7
4    1   6
5    4   3
6    4   7
7    4   6
8    3   7
9    3   6
10   7   6
11   2   3
12   2   1
13   3   1
14   3   1
15   3   4
16   3   2
17   1   4
18   1   2
19   4   2
20   4   9
21   4   6
22   4   3
23   9   6
24   9   3
25   6   3

data

dat <- 
structure(list(IdMain = 1:4, IdPartner1 = c(4L, 3L, 1L, 9L), 
    IdPartner2 = c(3L, 1L, 4L, 6L), IdPartner3 = c(7L, NA, 2L, 
    3L), IdPartner4 = c(6L, NA, NA, NA)), .Names = c("IdMain", 
"IdPartner1", "IdPartner2", "IdPartner3", "IdPartner4"),
class = "data.frame", row.names = c(NA, -4L))
lmo
  • 37,904
  • 9
  • 56
  • 69
  • thanks @lmo! for some reason i don't seem to be able to reproduce your results, i also get an edgelist where only the connections from `IdMain` to `PartnerX` are recorded. – julia_3010 Aug 17 '17 at 19:38
  • @julia_3010 If you copy the data and then also copy and run the two lines of code, you get the results that are printed in my answer. I am not exactly sure about your second statement, but the first 10 lines of my output include all of those you have in your desired output. – lmo Aug 17 '17 at 19:45