0

I have a dataframe with a series of binary variables:

df <- cbind (v1 = c(0,1,0,0,0,1), v2 = c(1,1,0,1,0,1), v3 = c(1,1,1,1,1,1))

df

#     v1 v2 v3
#[1,]  0  1  1
#[2,]  1  1  1
#[3,]  0  0  1
#[4,]  0  1  1
#[5,]  0  0  1
#[6,]  1  1  1

For each pair of the variables, I would like to know the number of rows in which the score in both of them was "1". In other words, I want to extract the n of rows in which v1 and v2 both == 1, v1 and v3 == 1, and v2 and v3 == 1. Ideally, in this specific case, the result would be a list indicating (1) a pair of variables (2) in how many rows it occured.

#pair   n
#v1-v2  2  
#v1-v3  2 
#v2-v3  4   

I am working on a much larger dataset (around 50 binary variables), so I am looking for a straightforward way to do this. I would very much appreciate the help.

phil
  • 55
  • 4
  • Possible duplicate: [Create a co-occurrence matrix from dummy-coded observations](https://stackoverflow.com/questions/10622730/create-a-co-occurrence-matrix-from-dummy-coded-observations). Maybe proceed to long format. – Henrik Mar 22 '21 at 14:00

1 Answers1

3

Try crossprod

> crossprod(df)
   v1 v2 v3
v1  2  2  2
v2  2  4  4
v3  2  4  6
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81