I have a data.table
like this one:
set.seed(12345)
mydt <- data.table(gr1 = sample(letters[1:2], size = 100, replace = TRUE),
gr2 = sample(letters[3:4], size = 100, replace = TRUE),
a = rnorm(100), b = rnorm(100), weight = rnorm(100, 5, 1))
The gr1
and gr2
specify the group membership of each case. I would like to obtain the correlation matrix from the cov.wt
function by the group membership specified by gr1
and gr2
, using the weight
column. The cov.wt
returns the correlation matrix if cor = TRUE
. I can split mydt
by gr1
and gr2
and then use lapply
to do the computations and extract each of the correlation matrices:
mydt <- split(x = mydt, by = c("gr1", "gr2"), drop = TRUE)
lapply(X = mydt, FUN = function(i) {
cov.wt(x = as.matrix(i[ , c("a", "b")]), wt = i[ , weight], cor = TRUE)[["cor"]]
})
I get exactly what I want:
$b.c
a b
a 0.99999999999999978 0.26861150206539375
b 0.26861150206539375 0.99999999999999978
$a.c
a b
a 0.99999999999999978 -0.13281683546112405
b -0.13281683546112405 1.00000000000000000
$b.d
a b
a 1.00000000000000000 -0.13064774898011455
b -0.13064774898011455 1.00000000000000000
$a.d
a b
a 0.99999999999999978 -0.61122086293705469
b -0.61122086293705458 0.99999999999999978
However, with large datasets this approach is rather slow. I would like to use the data.table
way to achieve this, just like the post from Dan Y under this question. I am struggling, thou, because of the extra parameters and the extraction of the correlation matrix from the list returned by the cov.wt
function. I tried the following (plus many variations):
mydt[ , .(cov.wt(as.matrix(a, b), wt = weight, cor = TRUE)["cor"]), by = c("gr1", "gr2")]
What I get at the end is just the first value of the diagonal of each matrix.
What do I do wrong?