Is there a way of elegantly calculating the correlations between values if those values are stored by group in a single column of a data.table (other than converting the data.table to a matrix)?
library(data.table)
set.seed(1) # reproducibility
dt <- data.table(id=1:4, group=rep(letters[1:2], c(4,4)), value=rnorm(8))
setkey(dt, group)
# id group value
# 1: 1 a -0.6264538
# 2: 2 a 0.1836433
# 3: 3 a -0.8356286
# 4: 4 a 1.5952808
# 5: 1 b 0.3295078
# 6: 2 b -0.8204684
# 7: 3 b 0.4874291
# 8: 4 b 0.7383247
Something that works, but requires the group names as input:
cor(dt["a"]$value, dt["b"]$value)
# [1] 0.1556371
I'm looking more for something like:
dt[, cor(value, value), by="group"]
But that does not give me the correlation(s) I'm after.
Here's the same problem for a matrix with the correct results.
set.seed(1) # reproducibility
m <- matrix(rnorm(8), ncol=2)
dimnames(m) <- list(id=1:4, group=letters[1:2])
# group
# id a b
# 1 -0.6264538 0.3295078
# 2 0.1836433 -0.8204684
# 3 -0.8356286 0.4874291
# 4 1.5952808 0.7383247
cor(m) # correlations between groups
# a b
# a 1.0000000 0.1556371
# b 0.1556371 1.0000000
Any comments or help greatly appreciated.