4

I have one data frame like this. The id of each line is unique and the type defines the group of the id.

id  type
a   a1
b   a1
c   a2
d   a3
e   a4
f   a4

I want to make a matrix like below. The value would be 1 if the two id belong to the same type, otherwise 0.

    a   b   c   d   e   f
a   1   1   0   0   0   0
b   1   1   0   0   0   0
c   0   0   1   0   0   0
d   0   0   0   1   0   0
e   0   0   0   0   1   1
f   0   0   0   0   1   1

The data frame is large (over 70 thousands line), and I do not know how to do this efficiently in R. Any suggestions would be appreciated.

niuyw
  • 43
  • 3

1 Answers1

6

Here is a base R solution, and I think you can use the following code

M <- crossprod(t(table(df)))

or

M <- crossprod(table(rev(df)))

such that

> M
   id
id  a b c d e f
  a 1 1 0 0 0 0
  b 1 1 0 0 0 0
  c 0 0 1 0 0 0
  d 0 0 0 1 0 0
  e 0 0 0 0 1 1
  f 0 0 0 0 1 1

DATA

df <- structure(list(id = c("a", "b", "c", "d", "e", "f"), type = c("a1", 
"a1", "a2", "a3", "a4", "a4")), class = "data.frame", row.names = c(NA, 
-6L))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • Hi, thank you for your help. It is very useful. But I get another question: since the data frame I have is very large (over 10 thousands lines), I found the way above quite slow. Is there any way to speed up? Please ignore me if it is unrealistic. – niuyw Dec 28 '19 at 15:13
  • @niuyw I think this might already be a very efficient approach, according to my knowledge... – ThomasIsCoding Dec 28 '19 at 17:23