0

I was wondering if you guys can help me building an adjacency matrix. I have data in CVS format like this:

Paper_ID    Author
2   Foster-McGregor, N.
3   Van Houte, M.
4   van de Meerendonk, A.
5   Farla, K.
6   van Houte, M.
6   Siegel, M.
8   Farla, K.
11  Farla, K.
11  Verspagen, B.

As you can see the column "Paper_ID" has a repeated value of 11, meaning that "Farla, K." and "Verspagen, B." are coauthors of a publication. I need to build a square weighted matrix using the names of the authors, counting the times that they are collaborating together.

Mario GS
  • 859
  • 8
  • 22

1 Answers1

2

Does the following do what you are looking for?

# simulate data.
d <- data.frame(
  id=c(2,3,4,5,6,6,8,11,11,12,12),
  author=c("FN", "VM","VA","FK","VM","SM","FK","FK","VB","FK","VB")
)

d
   id author
1   2     FN
2   3     VM
3   4     VA
4   5     FK
5   6     VM
6   6     SM
7   8     FK
8  11     FK
9  11     VB
10 12     FK
11 12     VB

# create incidence matrix:
m <- xtabs(~author+id,d)
m
      id
author 2 3 4 5 6 8 11 12
    FK 0 0 0 1 0 1  1  1
    FN 1 0 0 0 0 0  0  0
    SM 0 0 0 0 1 0  0  0
    VA 0 0 1 0 0 0  0  0
    VB 0 0 0 0 0 0  1  1
    VM 0 1 0 0 1 0  0  0

# convert to adjacency matrix.
# tcrossprod does "m %*% t(m)"
tcrossprod(m)
      author
author FK FN SM VA VB VM
    FK  4  0  0  0  2  0
    FN  0  1  0  0  0  0
    SM  0  0  1  0  0  1
    VA  0  0  0  1  0  0
    VB  2  0  0  0  2  0
    VM  0  0  1  0  0  2

Note that crossprod() will give you the incidence matrix for the id variable (i.e. will do t(m) %*% m).

ddiez
  • 1,087
  • 11
  • 26
  • Dear ddiez, I'm having some troubles with the crossprod command. I run a data set similar that the one you suggested: d <- data.frame(id=c(2,3,4,5,6,6,8,11,11,11,12,12,12),author=c("FN","VM","VA","FK","VM","SM","FK","FK","VB","VA","FK","VB","VA")) – Mario GS Dec 12 '14 at 10:44
  • What is your problem? I have none with this data you show. – ddiez Dec 12 '14 at 14:32
  • 1
    Also, it is not a good idea to edit your post to ask another question. If you have a different question write another post. Otherwise, your edits may make irrelevant/incorrect the answers posted. Edits are supposed to be made to improve, expand or clarify posts (either questions or answers). – ddiez Dec 12 '14 at 14:35
  • Dear @ddiez, the commands that you suggested worked fine for counting the number of collaborations between authors. The problem now is that I need to weight the matrix according to the number of collaborators in each paper. I rephrase my question, could you please have a look it again? – Mario GS Dec 12 '14 at 14:37
  • As I mentioned, you changed completely the original question. That is not how this site works. Please, take a looks a the help center on how to make good question. I suggest you revert your edits and ask your new question in a new post. – ddiez Dec 12 '14 at 14:41
  • Dear @ddiez, I'm sorry, but I don't have a backup of my original question, is possible that you can help me this time, or maybe change the title of the question instead? – Mario GS Dec 15 '14 at 15:04
  • Yes you have a backup. Click on the "edited Dec 12 at 11:33" near your badge in the question and see if you can revert to a previous version from there. If you can revert it, then post a new question with your current problem (you can also link this post, instead of repeating everything at the beginning). – ddiez Dec 16 '14 at 10:21
  • Dear @ddiez, I already rolled back the changes, and posted as new question as you requested. This is the link "http://stackoverflow.com/questions/27504466/building-a-weighed-adjacency-matrix-with-r-and-igraph". Hope that you can help me out. – Mario GS Dec 16 '14 at 12:36