-1

I have the following data frame:

# my_data
id  cg
1   a
2   b
3   a
3   b
4   b
4   c
5   b
5   c
5   d
6   d

I would like to compute the covariance of the values of cg. I believe I can obtain it by using cov() on the following matrix, where every cell counts the number of co-occurrences between two values of cg.

# my_matrix
cg  a  b  c  d
a   2  1  0  0
b   1  4  2  1
c   0  2  2  1
d   0  1  1  2

What is the quickest way to go from my_data to my_matrix? Please be aware that cg contains more than 700 unique values.

If there is a better way to generate the covariance matrix, I am also interested in that.

Here is the code to generate my_data:

my_data <- structure(list(id = c(1L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 5L, 6L),
                          cg = c("a", "b", "a", "b", "b", "c", "b", "c", "d", "d")),
                     .Names = c("id", "cg"),
                     class = "data.frame", row.names = c(NA, -10L))
Michele
  • 33
  • 8

1 Answers1

1

We can use crossprod with table

crossprod(table(my_data))
#    cg
#cg  a b c d
#  a 2 1 0 0
#  b 1 4 2 1
#  c 0 2 2 1
#  d 0 1 1 2
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Your solution worked. Thank you. But I now edited the original question in a way that this solution is no longer applicable. – Michele Apr 26 '17 at 14:04
  • @Michele Please post that as a new question as the answer was based on your original post. – akrun Apr 26 '17 at 16:18
  • 1
    Ok. The new question is here: http://stackoverflow.com/questions/43639805/compute-mean-jaccard-distance-between-elements-in-a-list – Michele Apr 26 '17 at 16:29
  • @Michele Thanks, I will check it out – akrun Apr 26 '17 at 16:30