1

I have the raw data of different people working for different universities at the same time, e.g.:

                UniA  UniB  UniC  UniD
individual_A    X     NA     X     NA
individual_B    NA     X     NA     X
individual_C    NA     X     NA    NA
individual_D    X      X      X    NA

And I try to use this data to establish a weighted undirect network betweeen universities. In other words, I would like to generate an adjacency matrix corresponding to the below given example:

       UniA UniB UniC UniD
UniA     0    1    2    0
UniB          1    1    1
UniC               0    0 
UniD                    0

How would this be possible in R. Any tips or pointers would be most appreciated.

Thank you in advance for your time and help.


EDIT: Can you help to reshape the data

              position1   position2  position3 position4
individual_A   UniA        UniC          NA       NA
individual_B   UniB        UniD          NA       NA
individual_C   UniB        NA            NA       NA
individual_D   UniA        UniB          UniC     NA

I tried to use the package reshape melt() and cast() converting the data to the form like I showed before:

                UniA  UniB  UniC  UniD
individual_A    X     NA     X     NA
individual_B    NA     X     NA     X
individual_C    NA     X     NA    NA
individual_D    X      X      X    NA

However, the value in the raw data is actually string (uniA/ uniB....), the transform is not successful. please help.

user20650
  • 24,654
  • 5
  • 56
  • 91
  • 1
    anything you've already tried? – Heroka Nov 24 '15 at 11:12
  • @Heroka sorry, have no idea. I was trying to compare only two colums at the same time following the solution under [link](http://stackoverflow.com/questions/14849835/how-to-calculate-adjacency-matrices-in-r), but it failed. – Lingyu Kong Nov 24 '15 at 23:23

1 Answers1

1

A possible solution, with the assumptions that the UniB diagonal value is zero, not one.

Data

dat = read.table(header=T, text="                UniA  UniB  UniC  UniD
individual_A    X     NA     X     NA
individual_B    NA     X     NA     X
individual_C    NA     X     NA    NA
individual_D    X      X      X    NA")

Calculation

out <- crossprod(!is.na(dat))
diag(out) <- 0

If you want the lower triangle to be zero

out[lower.tri(out)] <- 0

Explanation

The !is.na(dat) creates a logical matrix describing whether the data are missing or not (internally this is equivalent to zeros and ones). You then calculate the cross product. You can overwrite the diagonal values using the assign diag(dat) <-.


okay, re your comments, there appears to be two processes that are used to fill the adjacency matrix. 1) the off-diagonals record the number of individuals that attend each pair of universities 2) a diagonal is marked as non-zero, if it is the only university attended by an individual (although multiple individuals may attend it). I have assumed the value that it takes is the number of individuals who have it as their only attendance.

So following from before

d <- !is.na(dat)
out <- crossprod(d)
diag(out) <- 0

id <- rowSums(d)==1 # which individuals only attend one uni
mx <- max.col(d, "first")  # if there is only one attended which uni?
tab <- table(mx[id])
diag(out)[as.numeric(names(tab))] <- tab
out
#     UniA UniB UniC UniD
#UniA    0    1    2    0
#UniB    1    1    1    1
#UniC    2    1    0    0
#UniD    0    1    0    0

To reshape your data

library(reshape2) 
dat$id <- rownames(dat) 
m <- melt(dat, id="id", na.rm=TRUE)[-2] 
 table(m)
user20650
  • 24,654
  • 5
  • 56
  • 91
  • Thanks for your very creative answer! I never thought the solution like this way, since I am terrible with the LA. But the diagonal value is also crucial, because in my data, there are people who only work for one university. The diagonal values indicate such cases. any idea? cheers – Lingyu Kong Nov 24 '15 at 23:30
  • You;re welcome. Could you explain how you get the diagonal in your expected outcome above please .. for example, why is UniA = 0 & UniB = 1? thanks [ps. look at the diagonal of `crossprod(!is.na(dat))`, this gives the number of individuals at each uni) – user20650 Nov 24 '15 at 23:44
  • What I concern about is the ties between universities. e.g. individual_B works for UniB and UniD , so the dat[2,4]=1. but considering individual_C only works for UniB, it's like a tie between UniB itself. This is the reason why dat[2,2] =1. What I mean by the diagonal value is that it shows the case that people work only for one university. But anyway, your solution is very useful for me. 90% work has been done! I wanted to ask, how do you know that the cross product can solve the problem? Is there a network theory behind it or my LA knowledge is poor? ;-) cheers – Lingyu Kong Nov 25 '15 at 00:18
  • okay, thanks for the info... just to check then; the off-diagonals represents the the number of individuals who attend both uni (which is what we have), but the diagonal is a different measure, in which it is recorded one, if it is the only uni that one individual attends (even though multiple individuals may attend it), and zero otherwise? Should it be recorded two if it is the only uni attended by two individuals? [ps.. this does seem a bit confusing to be mixing the measures here, as the weights will mean different things] – user20650 Nov 25 '15 at 01:06
  • yes, I guess you are right. I was confused by the mixed measures. Thanks for your reminder. – Lingyu Kong Nov 25 '15 at 02:52
  • sorry to trouble you again – Lingyu Kong Nov 25 '15 at 03:20
  • 1
    Looks okay to me. That is some creative way to get the expected output. – akrun Nov 25 '15 at 04:30