1

I face a Problem in R which I can't handle myself.

I have a data frame that looks like this with more variables und cases:

ID      Var1   Var2   Var3   Var4
1          1      0      1      1
2          0      0      0      0
3          1      1      1      1
4          1      1      0      1
5          1      0      1      0

I like to have — similar to a correlation matrix — a matrix that shows the frequency that a pair of variables have the same value — for example the value "1". The resulting matrix for the df above should then be like.

           Var1   Var2   Var3   Var4
Var1                2      3      3
Var2                       1      2
Var3                              2
Var4                              

Perhaps you can help. Thank you in advance.

kuli
  • 13
  • 3

2 Answers2

1

You can try crossprod like below

replace(m <- crossprod(as.matrix(df[-1])), lower.tri(m, diag = TRUE), NA)

which gives

     Var1 Var2 Var3 Var4
Var1   NA    2    3    3
Var2   NA   NA    1    2
Var3   NA   NA   NA    2
Var4   NA   NA   NA   NA

Data

> dput(df)
structure(list(ID = 1:5, Var1 = c(1L, 0L, 1L, 1L, 1L), Var2 = c(0L,
0L, 1L, 1L, 0L), Var3 = c(1L, 0L, 1L, 0L, 1L), Var4 = c(1L, 0L, 
1L, 1L, 0L)), class = "data.frame", row.names = c(NA, -5L))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
0

First create a evaluation data matrix that tests for your value, here 1.

e <- d[-1] == 1  ## value to test

Then use outer to compare the columns crosswise with a FUNction that sums how often there are two TRUEs summing up to 2. From the result you apparently want to remove the lower.tri including the diagonal.

FUN <- Vectorize(function(i, j) sum(e[,i] + e[,j] == 2))
(res <- t(outer(1:ncol(e), 1:ncol(e), FUN)))
res[lower.tri(res, diag=1)] <- NA
res
#      [,1] [,2] [,3] [,4]
# [1,]   NA    2    3    3
# [2,]   NA   NA    1    2
# [3,]   NA   NA   NA    2
# [4,]   NA   NA   NA   NA
jay.sf
  • 60,139
  • 8
  • 53
  • 110