0

I'm wondering is there any simple r function able to convert binary columns into square matrix with condition?

I have below source dataframe:

enter image description here

structure(list(SHOES = c(0,0,0,0,0,0,0,0,0),  
               LEATHER = c(0,0,0,0,0,0,0,0,0), 
               SPORTSWEAR = c(1,1,1,0,0,0,1,0,0), 
               SHIRTS = c(1,0,1,0,0,0,0,0,0), 
               SUITS = c(0,0,1,0,0,0,0,0,1)), 
          .Names = c("SHOES", "LEATHER", "SPORTSWEAR", "SHIRTS", "SUITS"), 
          class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -9L))

The result i hope to get as below (Condition: Based on source dataframe,if both columns have 1 then count as 1) then aggregate the counts.

Example 1: SPORTSWEAR & SHIRTS had 2 occurrence (both equal to 1), therefore aggregate count as 2, others remain 0.

Example 2: SHIRTS & SUITS had 1 occurrence (both equal to 1), therefore aggregate count as 1, others remain 0.

enter image description here

Henrik
  • 65,555
  • 14
  • 143
  • 159
yc.koong
  • 175
  • 2
  • 10
  • Related: [Create a co-occurrence matrix from dummy-coded observations](https://stackoverflow.com/questions/10622730/create-a-co-occurrence-matrix-from-dummy-coded-observations) You only need to tweek the replacement with zeros. See my [comment below](https://stackoverflow.com/questions/53362348/r-convert-binary-columns-into-square-matrix-with-condition#comment93602158_53362479) – Henrik Nov 18 '18 at 16:08

1 Answers1

2
m = sapply(df1, function(x) sapply(df1, function(y) sum(x*y)))
replace(m, !lower.tri(m), 0)
#           SHOES LEATHER SPORTSWEAR SHIRTS SUITS
#SHOES          0       0          0      0     0
#LEATHER        0       0          0      0     0
#SPORTSWEAR     0       0          0      0     0
#SHIRTS         0       0          2      0     0
#SUITS          0       0          1      1     0

DATA

df1 = structure(list(SHOES = c(0, 0, 0, 0, 0, 0, 0, 0, 0), LEATHER = c(0, 
0, 0, 0, 0, 0, 0, 0, 0), SPORTSWEAR = c(1, 1, 1, 0, 0, 0, 1, 
0, 0), SHIRTS = c(1, 0, 1, 0, 0, 0, 0, 0, 0), SUITS = c(0, 0, 
1, 0, 0, 0, 0, 0, 1)), class = "data.frame", row.names = c(NA, 
-9L))
d.b
  • 32,245
  • 6
  • 36
  • 77