2

I have the following 6x10 matrix, where the rows are members of parliament and the columns are issues they voted on.

> print(a)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    0    0    0    0    0    1    1    0    0     1
[2,]   NA    1    1    0    0    1    1    1    0     0
[3,]    0    0    0   NA    1   NA    0    1    1     1
[4,]    0    1    1   NA    0    1    1    1    1     0
[5,]    0    0    0    1    0    0    1    1    0    NA
[6,]    1    1    0    0    1    1    1    0    0    NA

I am trying to write a for loop that would produce a matrix containing the agreement rates between each pair of members of parliament i and j. The agreement rate is calculated as the number of issues on which i and j agreed, divided by the number of issues on which i and j voted.

The code below seems to work when I run it on 2nd and 3rd rows, does not work on 5th and 6th rows (NA's in the same element position) and gives an error when it is run in the loop: "Error in b[j, i] <- length(which(a[i, ] == a[j, ]))/ifelse(which(is.na(a[i, : replacement has length zero"

How can I fix the error? If someone could suggest a more efficient way of calculating the agreement rate, that would be greatly appreciated!

b <- matrix(nrow=6, ncol=6)

for (i in 1:nrow(a)) {
  for (j in 1:nrow(a)) {
    b[j, i] <- length(which(a[i,] == a[j,]))/
      ifelse(which(is.na(a[i,])) %in% which(is.na(a[j,]))==0,
             length(a[i,]) - (length(which(is.na(a[i,]))) + length(which(is.na(a[j,])))),
             length(a[i,]) - (length(which(is.na(a[i,])) %in% which(is.na(a[j,]))) +
               length(!(which(is.na(a[i,])) %in% which(is.na(a[j,]))))) +
               length(!(which(is.na(a[j,])) %in% which(is.na(a[i,])))))
      }
}

The result should look like this:

          [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
[1,] 1.0000000 0.5555556 0.5000000 0.3333333 0.6666667 0.6666667
[2,] 0.5555556 1.0000000 0.1428571 0.8750000 0.5000000 0.6250000
[3,] 0.5000000 0.1428571 1.0000000 0.3750000 0.5714286 0.2857143
[4,] 0.3333333 0.8750000 0.3750000 1.0000000 0.5000000 0.3750000
[5,] 0.6666667 0.5000000 0.5714286 0.5000000 1.0000000 0.3333333
[6,] 0.6666667 0.6250000 0.4285714 0.3750000 0.3333333 1.0000000

Calculated by hand:

result<- matrix(nrow=6, ncol=6, c(1, 5/9, 4/8, 3/9, 6/9, 6/9,
                                5/9, 1, 1/7, 7/8, 4/8, 5/8,
                                4/8, 1/7, 1, 3/8, 4/7, 3/7,
                                3/9, 7/8, 3/8, 1, 4/8, 3/8,
                                6/9, 4/8, 4/7, 4/8, 1, 3/9,
                                6/9, 5/8, 2/7, 3/8, 3/9, 1))
snitsova
  • 49
  • 4
  • I'm not sure what result you want, but maybe a cross-product does it - `crossprod(replace(mat, is.na(mat), 0))` ? – thelatemail Sep 01 '20 at 22:32
  • Previous related discussion here - https://stackoverflow.com/questions/19977596/how-do-i-calculate-the-co-occurrence-in-the-table – thelatemail Sep 01 '20 at 22:39
  • The result should be a symmetric 6x6 matrix with values ranging between 0 and 1, 1's on the diagonal. – snitsova Sep 01 '20 at 22:42
  • I'm absolutely stumped on calculating the denominator in any simple fashion, but I think I can get the numerator with `tcp <- function(x) tcrossprod(replace(x, is.na(x), 0)); tcp(mat) + tcp(!mat)` – thelatemail Sep 02 '20 at 08:17

2 Answers2

1

Maybe you can try combn like below

b <- diag(nrow(a))
b[lower.tri(b)] <- combn(nrow(a),2,FUN = function(k) {v <- colSums(a[k,]);sum(v%%2==0,na.rm = TRUE)/sum(!is.na(v))})
b[upper.tri(b)] <- t(b)[upper.tri(b)]

which gives

> b
          [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
[1,] 1.0000000 0.5555556 0.5000000 0.4444444 0.6666667 0.6666667
[2,] 0.5555556 1.0000000 0.1428571 0.8750000 0.5000000 0.6250000
[3,] 0.5000000 0.1428571 1.0000000 0.3750000 0.5714286 0.2857143
[4,] 0.4444444 0.8750000 0.3750000 1.0000000 0.5000000 0.3750000
[5,] 0.6666667 0.5000000 0.5714286 0.5000000 1.0000000 0.3333333
[6,] 0.6666667 0.6250000 0.2857143 0.3750000 0.3333333 1.0000000

Data

> dput(a)
structure(c(0L, NA, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 
1L, 0L, 1L, 0L, 0L, 0L, 0L, NA, NA, 1L, 0L, 0L, 0L, 1L, 0L, 0L,
1L, 1L, 1L, NA, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L,
1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, NA, NA), .Dim = c(6L, 
10L))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • This looks close to what I need! I just included the result I am aiming for in the question. Thanks! – snitsova Sep 01 '20 at 23:16
  • @snitsova You are welcome. Could you explain a bit why `b[2,1]` is `5/9`? How do the `5` and `9` come? I have no idea about the logic behind – ThomasIsCoding Sep 02 '20 at 07:28
  • `b[2,1]` is `5/9` because members of parliament 2 and 1 voted the same way -- either both said No (0,0) or both said Yes (1,1) -- 5 times and the number of issues members of parliament 2 and 1 both voted on, irrespective of whether they both said Yes or No or differed in their vote, is 9 (the NA in `a[2,1]` indicates that member of parliament 2 did not vote on issue 1). Does this make sense? – snitsova Sep 02 '20 at 13:00
  • @snitsova Yes, now it seems clear to me. Please check out my update – ThomasIsCoding Sep 02 '20 at 13:16
0

Here is a solution using a for loop.

b <- matrix(nrow=6, ncol=6)

for (i in 1:nrow(a)) {
for (j in 1:nrow(a)) {
  b[j, i] <- length(which(a[i,] == a[j,]))/
    (length(a[i,]) -
       length(union(which(is.na(a[i, ])), which(is.na(a[j, ])))))
}
}
snitsova
  • 49
  • 4