Compute covariance matrix from list of occurrences

Question

I have the following data frame:

# my_data
id  cg
1   a
2   b
3   a
3   b
4   b
4   c
5   b
5   c
5   d
6   d

I would like to compute the covariance of the values of cg. I believe I can obtain it by using cov() on the following matrix, where every cell counts the number of co-occurrences between two values of cg.

# my_matrix
cg  a  b  c  d
a   2  1  0  0
b   1  4  2  1
c   0  2  2  1
d   0  1  1  2

What is the quickest way to go from my_data to my_matrix? Please be aware that cg contains more than 700 unique values.

If there is a better way to generate the covariance matrix, I am also interested in that.

Here is the code to generate my_data:

my_data <- structure(list(id = c(1L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 5L, 6L),
                          cg = c("a", "b", "a", "b", "b", "c", "b", "c", "d", "d")),
                     .Names = c("id", "cg"),
                     class = "data.frame", row.names = c(NA, -10L))

I prepared a solution and posted it on the dupe link as this was deleted earlier — akrun, Apr 26 '17 at 13:08

score 1 · Accepted Answer · answered Apr 26 '17 at 10:13

1

We can use crossprod with table

crossprod(table(my_data))
#    cg
#cg  a b c d
#  a 2 1 0 0
#  b 1 4 2 1
#  c 0 2 2 1
#  d 0 1 1 2

answered Apr 26 '17 at 10:13

akrun

874,273
37
540
662

Your solution worked. Thank you. But I now edited the original question in a way that this solution is no longer applicable. – Michele Apr 26 '17 at 14:04
@Michele Please post that as a new question as the answer was based on your original post. – akrun Apr 26 '17 at 16:18
1

Ok. The new question is here: http://stackoverflow.com/questions/43639805/compute-mean-jaccard-distance-between-elements-in-a-list – Michele Apr 26 '17 at 16:29
@Michele Thanks, I will check it out – akrun Apr 26 '17 at 16:30

Compute covariance matrix from list of occurrences

1 Answers1

Linked