0

I have a large amount of graph data in the following form. Suppose a person has multiple interests.

person,interest
1,1
1,2
1,3
2,1
2,5
2,2
3,2
3,5
...

I want to construct all pairs of interests for each user. I would like to convert this into an edgelist like the following. I want the data in this format so that I can convert it into an adjacency matrix for graphing etc.

person,x_interest,y_interest
1,1,2
1,1,3
1,2,3
2,1,5
2,1,2
2,5,2
3,2,5

There is one solution here: Pairs of Observations within Groups but it works only for small datasets as the call to table wants to generate more than 2^31 elements. Is there another way that I can do this without having to rely on table?

Community
  • 1
  • 1
Ryan R. Rosario
  • 5,114
  • 9
  • 41
  • 56

1 Answers1

1

We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(df1), grouped by 'person', we get the unique pairwise combinations of 'interest' to create two columns ('x_interest' and 'y_interest').

 library(data.table)
 setDT(df1)[,{tmp <- combn(unique(interest),2)
       list(x_interest=tmp[c(TRUE, FALSE)], y_interest= tmp[c(FALSE, TRUE)])} , by =  person]

NOTE: To speed up, combnPrim from library(gRbase) could be used in place of combn.

data

df1 <- structure(list(person = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), 
interest = c(1L, 
2L, 3L, 1L, 5L, 2L, 2L, 5L)), .Names = c("person", "interest"
), class = "data.frame", row.names = c(NA, -8L))
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • This fails on my actual large dataset. The problem appears to be in combn and combnPrim: Error in rep.int(0L, NSET * NSEL) : invalid 'times' value In addition: Warning message: In combnPrim(unique(interest), 2) : NAs introduced by coercion – Ryan R. Rosario Nov 04 '15 at 07:15
  • @RyanRosario I couldn't reproduce the error. If you an show a small reproducible example that show the error, I can look into it – akrun Nov 04 '15 at 07:23