How to store mean vectors and covariance matrices in cells of a data table?

Question

Consider a data table with two numeric and one categorical feature. I would like to convert this data table to a new data table. Each row of this data table should correspond to one value of the categorical feature. Furthermore, it should contain a column with the mean vectors that result from the numeric features for each categorical value plus a column with the covariance matrices. Additionally, one can only use an object that contains the names of the columns that refer to the numeric features.

It seems one has to use lists for this, see for example R - store a matrix into a single dataframe cell. However, this information does not help me enough.

Here is an example:

library(data.table)

set.seed(42)
a <- sample(1:3, 10, TRUE)
b <- rnorm(10)
d <- rpois(10, 3)
data <- data.table(a, b, d)
bd <- c("b", "d")

dat <- data[, 
            .(mu = mean(get(bd)), 
              sigma = get(cov(bd))), 
            by = a]

What I want is that dat has three rows, each corresponding to one value in a. This data table should also contain a column with three vectors of length 2 and a column with three 2x2 matrices.

The use of `cov` is not clear. it requires a `matrix` – akrun Aug 19 '21 at 18:31 — akrun, Aug 19 '21 at 18:31

akrun · Accepted Answer · 2021-08-19T21:19:58.280

3

We can use mget instead of get as get is for returning a single object value and mget for one or more

data[, lapply(mget(bd), function(x) mean(x)), by = a]

If we need a list column

data[, .(mu = .(as.list(lapply(mget(bd), function(x) mean(x))))), by = a]

IF we want both columns i.e. cov as well

data[, .(mu = .(sapply(mget(bd), function(x) mean(x))), 
       sigma = .(cov(do.call(cbind, mget(bd)))[2])), by = a]
   a                  mu     sigma
1: 1 0.2353046,2.2000000 -2.131663
2: 2 0.1876238,3.3333333  2.062627
3: 3 0.9299794,1.5000000 0.1445644

edited Aug 19 '21 at 21:19

answered Aug 19 '21 at 18:28

akrun

874,273
37
540
662

Thank you for the quick response. This solution results in two mean columns. However, I would like to have one column which contains not single values but vectors of means. – Fire Salamander Aug 19 '21 at 18:44
@FireSalamander you meant as a `list`? then wrap it in a `list` – akrun Aug 19 '21 at 18:46
This goes in the right direction. I was thinking more of something like `data[, .(mu = .(sapply(mget(bd), function(x) mean(x)))), by = a]` where each list element is a vector. Maybe now it becomes clearer what I want to achieve with the `cov()` function. The third column should contain the covariance matrices that belong to the means. I tried `data[, .(mu = .(sapply(mget(bd), function(x) mean(x))), sigma = .(cov(mget(bd)))), by = a]` but this doesn't work. – Fire Salamander Aug 19 '21 at 19:26
@FireSalamander please check the updated one – akrun Aug 19 '21 at 21:20
1

Perfect. Subsetting was actually not necessary because I need the whole matrix, so `sigma = .(cov(do.call(cbind, mget(bd))))` is what I want to have. Thank you very much for your help. – Fire Salamander Aug 20 '21 at 06:23

How to store mean vectors and covariance matrices in cells of a data table?

1 Answers1