How to run for loop or apply function on data.table...?

Question

In the following toy example, I'd like to multiply each column of a “data.table” (dt.adj) by the corresponding index of a vector (sd).

Toy data

F <- matrix(c(1.0, 0.7,  0.7,  0.5,
              0.7, 1.0,  0.95, 0.3,
              0.7, .95,  1.5,  0.3,
              0.5, 0.3,  0.3,  1.25), nrow=4, ncol=4)

sd <- round(sqrt(diag(F)),2)

> sd
[1] 1.00 1.00 1.22 1.12

Data generation

set.seed(123)
mu <- rep(0,nrow(F))
df.sim <- as.data.frame(mvrnorm(100, mu,F))
cols <- paste0("f", seq(1:4))
colnames(df.sim) <- cols 
df.adj <- as.data.frame(scale(df.sim))


> head(df.adj, 3)
         f1          f2         f3         f4
1  1.654323  0.74106703  0.2938761 -0.6016496
2  1.067877  0.07888542 -0.1097025  0.2326684
3 -1.283677 -1.75660825 -1.1412116 -0.8720620

So, the product of each column of data.frame (df.adj) by each element of vector sd is straightforward in the data.frame case. For example, using for loop

for (col in 1: dim(F)[1]) {
df.adj[, col] <- df.adj[, col]*sd[col]
}

Thus, the covariance matrix of this adjusted df (var(df.adj) is “something” similar to the original matrix F, i.e.

> var(df.adj)
          f1        f2        f3        f4
f1 1.0000000 0.6725119 0.6625572 0.4996134
f2 0.6725119 1.0000000 0.8967235 0.3104626
f3 0.6625572 0.8967235 1.4884000 0.2134854
f4 0.4996134 0.3104626 0.2134854 1.2544000

However, I want to work with a data.table instead of data.frame. In this case

dt.sim <- as.data.table(mvrnorm(100, mu,F))
setnames(dt.sim, cols) 
dt.adj <- as.data.table(scale(dt.sim))

Unfortunately, with a data.table, I can´t figure out how to perform the previous calculation. For instance

for (col in 1:dim(F)[1]) set(dt.adj, j = col, value= dt.adj[, col]*sd[col])

Error in set(dt.adj, j = col, value = dt.adj[, col] * sd[col]) : 
  dt passed to assign isn't type VECSXP

I´m sure that there are several efficient ways (example. using lapply()) to carry out this operation, so any help will be greatly appreciated

What are the specific reasons to switch from _matrix_ to _data.table_ for this kind of operation? If you can stay with _matrix_ operations [this answer](https://stackoverflow.com/a/17080448/3817004) suggests to use `mat.adj %*% diag(sd)` — Uwe, Mar 21 '22 at 08:58

Waldi · Accepted Answer · 2022-03-21T12:13:17.080

1

With for loop:

for (col in 1:length(cols)) set(dt.adj, j = col, value= dt.adj[[col]]*sd[col])

head(dt.adj,3)
           f1          f2         f3         f4
        <num>       <num>      <num>      <num>
1:  1.6543232  0.74106703  0.3585289 -0.6738476
2:  1.0678772  0.07888542 -0.1338370  0.2605886
3: -1.2836767 -1.75660825 -1.3922781 -0.9767094

This solution uses [[]] list syntax as a data.table is primarily a list.

edited Mar 21 '22 at 12:13

answered Mar 19 '22 at 06:56

Waldi

39,242
6
30
78

This results corresponding to dt.adj before multiplication by sd. For example, the correct element `df.adj[1, 4] = -0.6726647` which is resulting from `df.adj[1,4]*sd[4]=-0.6016496*1.118034=-0.6726647`. – Rob Mar 20 '22 at 23:40
OK, got it, see solution with for loop – Waldi Mar 21 '22 at 12:13

How to run for loop or apply function on data.table...?

1 Answers1