R - Create a Matrix of Variables with the frequency of same values

Question

I face a Problem in R which I can't handle myself.

I have a data frame that looks like this with more variables und cases:

ID      Var1   Var2   Var3   Var4
1          1      0      1      1
2          0      0      0      0
3          1      1      1      1
4          1      1      0      1
5          1      0      1      0

I like to have — similar to a correlation matrix — a matrix that shows the frequency that a pair of variables have the same value — for example the value "1". The resulting matrix for the df above should then be like.

           Var1   Var2   Var3   Var4
Var1                2      3      3
Var2                       1      2
Var3                              2
Var4

Perhaps you can help. Thank you in advance.

score 1 · Answer 1 · answered Feb 18 '21 at 12:55

1

You can try crossprod like below

replace(m <- crossprod(as.matrix(df[-1])), lower.tri(m, diag = TRUE), NA)

which gives

     Var1 Var2 Var3 Var4
Var1   NA    2    3    3
Var2   NA   NA    1    2
Var3   NA   NA   NA    2
Var4   NA   NA   NA   NA

Data

> dput(df)
structure(list(ID = 1:5, Var1 = c(1L, 0L, 1L, 1L, 1L), Var2 = c(0L,
0L, 1L, 1L, 0L), Var3 = c(1L, 0L, 1L, 0L, 1L), Var4 = c(1L, 0L, 
1L, 1L, 0L)), class = "data.frame", row.names = c(NA, -5L))

answered Feb 18 '21 at 12:55

ThomasIsCoding

96,636
9
24
81

Thank you! Can crossprod also be used to refer to specific value like jay.sf proposed in the other answer? – kuli Feb 18 '21 at 18:07
1

Nice option with `lower.tri` – akrun Feb 18 '21 at 18:53
@kuli This work only if values are `0` and `1`. – ThomasIsCoding Feb 18 '21 at 23:32

score 0 · Accepted Answer · answered Feb 18 '21 at 12:52

First create a evaluation data matrix that tests for your value, here 1.

e <- d[-1] == 1  ## value to test

Then use outer to compare the columns crosswise with a FUNction that sums how often there are two TRUEs summing up to 2. From the result you apparently want to remove the lower.tri including the diagonal.

FUN <- Vectorize(function(i, j) sum(e[,i] + e[,j] == 2))
(res <- t(outer(1:ncol(e), 1:ncol(e), FUN)))
res[lower.tri(res, diag=1)] <- NA
res
#      [,1] [,2] [,3] [,4]
# [1,]   NA    2    3    3
# [2,]   NA   NA    1    2
# [3,]   NA   NA   NA    2
# [4,]   NA   NA   NA   NA

Thank you. Worked like a charm ;o) – kuli Feb 18 '21 at 18:06 — kuli, Feb 18 '21 at 18:06

R - Create a Matrix of Variables with the frequency of same values

2 Answers2