0

let's say that I have a data frame like below. That is, each person has multiple diagnoses (dx).

person dx1  dx2 dx3 dx4
A  Y  Y  N  N  
B  N  N  Y  Y 
C  Y  Y  N  Y
...

Now, for each of the above dx, how could I generate a tabulation as below, to output the counts for each possible combination of dx. The following counts are made up for demonstration purpose. For the first line of record for example, it would mean 2 persons have dx1 (but nothing else), 1 person has both dx1 and dx2.

N    dx1 dx2 dx3 dx4
dx1   2   1   0   0
dx2   0   1   1   0
dx3   1   2   1   1
dx4   0   0   1   0

Your kind help is greatly appreciated!

Best regards, Jie

  • So how would be the expected output of your example data? The one you have provided? How the third person with three entries will be counted? – Roman Jul 24 '20 at 13:26

2 Answers2

2

Maybe you can try crossprod

> crossprod(df[-1]=="Y")
    dx1 dx2 dx3 dx4
dx1   2   2   0   1
dx2   2   2   0   1
dx3   0   0   1   1
dx4   1   1   1   2
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
0

I think you can do this via outer :

cols <- names(df)[-1]
apply_fun <- function(x, y) sum(df[, x] == 'Y' & df[, y] == 'Y')
mat <- outer(cols, cols, Vectorize(apply_fun))
dimnames(mat) <- list(cols, cols)
mat

#    dx1 dx2 dx3 dx4
#dx1   2   2   0   1
#dx2   2   2   0   1
#dx3   0   0   1   1
#dx4   1   1   1   2

data

df <- structure(list(person = c("A", "B", "C"), dx1 = c("Y", "N", "Y"
), dx2 = c("Y", "N", "Y"), dx3 = c("N", "Y", "N"), dx4 = c("N", 
"Y", "Y")), class = "data.frame", row.names = c(NA, -3L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213