2

Im trying to convert a data set in a long format panel structure to an adjacency matrix or edge list to make network graphs. The data set contains articles each identified by an ID-number. Each article can appear several times under a number of categories. Hence I have a long format structure at the moment:

ID <- c(1,1,1,2,2,2,3,3)
Category <- c("A","B","C","B","E","H","C","E")
dat <- data.frame(ID,Category)

I want to convert this into an adjacency matrix or edge list. Where the edge list such look something like this

A B
A C
B C
B E
B H
E H
C E 

Edit: I have tried dat <- merge(ID, Category, by="Category") but it returns the error message Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column

Thanks in advance

Update: I ended up using the crossprod(table(dat)) from the comments, but the solution suggested by Navy Cheng below works just as well

2 Answers2

1

This code will work

do.call(rbind,lapply(split(dat, dat$ID), function(x){
   t(combn(as.vector(x$Category), 2))
}))

Update

As @Parfait 's suggestion, you can have by instead of split+lapply.

1) Use by to group nodes ("A", "B", "C" ...) by Category;

2) Use combn to create edge between nodes in each group, and t to transform the matrix for further rbind

> edge.list <- by(dat, dat$ID, function(x) t(combn(as.vector(x$Category), 2)))

dat$ID: 1
     [,1] [,2]
[1,] "A"  "B" 
[2,] "A"  "C" 
[3,] "B"  "C" 
------------------------------------------------------------ 
dat$ID: 2
     [,1] [,2]
[1,] "B"  "E" 
[2,] "B"  "H" 
[3,] "E"  "H" 
------------------------------------------------------------ 
dat$ID: 3
     [,1] [,2]
[1,] "C"  "E" 

3) Then merge the list

> do.call(rbind, edge.list)

    [,1] [,2]
[1,] "A"  "B" 
[2,] "A"  "C" 
[3,] "B"  "C" 
[4,] "B"  "E" 
[5,] "B"  "H" 
[6,] "E"  "H" 
[7,] "C"  "E"
Navy Cheng
  • 573
  • 4
  • 14
0

So if you are willing to convert your data.frame to a data.table this problem can be solved pretty efficiently and cleanly and if you have many rows will be much faster.

    library(data.table)
    dat<-data.table(dat)

Basically you can apply functions to columns of the data.table in the j cell and group in the k cell. So you want all the combinations of categories taken two at a time for each ID which looks like this:

    dat[,combn(Categories,2),by=ID]

However stopping at this point will keep the ID column and by default create a column called V1 that basically concatenates the array returned by combn into a vector of the categories and not the two-column adjacency matrix that you need. But by chaining another call to this you can create the matrix easily as you would with any single vector. In one line of code this will look like:

    dat[,combn(Category,2),by=ID][,matrix(V1,ncol=2,byrow = T)]

Remember that the vector column we wish to convert to a matrix is called V1 by default and also we want the 2-column matrix to be created by row instead of the default which is by column. Hope that helps and let me know if I need to add anything to my explanation. Good luck!

Jason Johnson
  • 451
  • 3
  • 7