finding the number of co-occurences of multiple binary variables in R

Question

I have a dataframe with a series of binary variables:

df <- cbind (v1 = c(0,1,0,0,0,1), v2 = c(1,1,0,1,0,1), v3 = c(1,1,1,1,1,1))

df

#     v1 v2 v3
#[1,]  0  1  1
#[2,]  1  1  1
#[3,]  0  0  1
#[4,]  0  1  1
#[5,]  0  0  1
#[6,]  1  1  1

For each pair of the variables, I would like to know the number of rows in which the score in both of them was "1". In other words, I want to extract the n of rows in which v1 and v2 both == 1, v1 and v3 == 1, and v2 and v3 == 1. Ideally, in this specific case, the result would be a list indicating (1) a pair of variables (2) in how many rows it occured.

#pair   n
#v1-v2  2  
#v1-v3  2 
#v2-v3  4

I am working on a much larger dataset (around 50 binary variables), so I am looking for a straightforward way to do this. I would very much appreciate the help.

Possible duplicate: [Create a co-occurrence matrix from dummy-coded observations](https://stackoverflow.com/questions/10622730/create-a-co-occurrence-matrix-from-dummy-coded-observations). Maybe proceed to long format. — Henrik, Mar 22 '21 at 14:00

score 3 · Accepted Answer · answered Mar 22 '21 at 13:59

3

Try crossprod

> crossprod(df)
   v1 v2 v3
v1  2  2  2
v2  2  4  4
v3  2  4  6

answered Mar 22 '21 at 13:59

ThomasIsCoding

96,636
9
24
81

1

crossprod worked perfectly. note also that, in order to reshape it into a list (to be able to extract the unique elements), I used as.data.frame(as.table(crossprod(df))) – phil Mar 23 '21 at 12:26
1

@phil Yes, you got it! – ThomasIsCoding Mar 23 '21 at 12:34

finding the number of co-occurences of multiple binary variables in R

1 Answers1