0

I have a set of categorical variables coded as one-hot format. Im trying to make something like a correlation matrix, but calculating the times every pair of variables is "on" together (meaning sum every case the two variables are 1) I know i can calculate that by just multiplying both vectors and then sum the total (as only the times when both are 1 will add to the sum) But i cant think of a way to make the final matrix. For example I have this dataset

 A B C D E
 1 1 0 1 0
 0 1 0 0 1
 0 0 1 1 1
 0 0 1 0 1
 0 0 0 0 1

i need a matrix like this (the diagonal values doesnt really matter)

  A B C D E
A - 1 0 1 0
B 1 - 0 1 0
C 0 0 - 1 2
D 1 1 1 - 1
E 0 0 2 1 -

Notice for example that E-C is 2 because in 2 ocations both were On (1)

pogibas
  • 27,303
  • 19
  • 84
  • 117
user3639100
  • 336
  • 1
  • 11

0 Answers0