1

I am attempting to get a column wise zscore using row mean and row standard deviation in R. I am new to the more complex functions like apply(), so I am not sure the best way to execute this without doing it manually in an embedded for loop. Exp is a expression matrix and will be very large so an embedded for loop will take some time. Please excuse the rough syntax. I need to retain the colnames.

Dat<-for (i in 1:length(nrow(Exp))) {
for (j in 1:length(ncol(Exp))) {
(Exp[,j]-rowMean(Ex[i,]))/rowSds[i,]
}
}

If I use apply() by rows, I don't retain the column names, and if I use apply() by the column, the wrong mean and standard deviations are used. I need to iterate over each cell, and a zscore calculation using the row mean and standard deviation for the row of that cell, but maintain the column names.

Any direction to resources or help would be appreciated. Thanks!

Manninm
  • 151
  • 1
  • 7
  • 2
    Not clear. about the columnwise, rowmean. Can you please show a small rreproducoible example – akrun Mar 24 '20 at 17:25
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Mar 24 '20 at 17:25
  • `length(nrow(...))` will always reply 1, since `nrow` returns a single number and the `length` of that single number is 1. If the frame has 0 rows, `length(nrow(mtcars[0,]))` still returns 1, certainly not what you need. (The exception is when `nrow` fails.) For robust programming, consider replacing `1:length(...)` with `seq_len(nrow(Exp))`. – r2evans Mar 24 '20 at 17:33

4 Answers4

2

At first, do not use apply. You can creat your own colVars function using the colmeans. That's what I had done and now that function exists in c++. It is the Rfast::colVars(x).

Michail
  • 177
  • 1
  • 6
1

We can speed this up (and make it more readable).

Fake data:

set.seed(42)
mtx <- matrix(rnorm(200), ncol=10)
mtx[1:4,1:3]
#            [,1]       [,2]       [,3]
# [1,]  1.3709584 -0.3066386  0.2059986
# [2,] -0.5646982 -1.7813084 -0.3610573
# [3,]  0.3631284 -0.1719174  0.7581632
# [4,]  0.6328626  1.2146747 -0.7267048

We can calculate the row-wise mean and standard deviation with:

rowSigma <- apply(mtx, 1, sd, na.rm = TRUE)
rowMu <- rowMeans(mtx, na.rm = TRUE)

(I'm inferring na.rm=TRUE here ... though it might not be relevant for your data.)

From here, know that basic matrix-wise math (not linear algebra ops) is typically column-wise. To demonstrate/prove this,

m <- matrix(1:9, nrow = 3)
m
#      [,1] [,2] [,3]
# [1,]    1    4    7
# [2,]    2    5    8
# [3,]    3    6    9
m + 1:3
#      [,1] [,2] [,3]
# [1,]    2    5    8
# [2,]    4    7   10
# [3,]    6    9   12

With that confidence, we can now simply do

(mtx - rowMu) / rowSigma
#         [,1]  [,2]    [,3]    [,4]   [,5]    [,6]   [,7]  [,8]  [,9]  [,10]
#  [1,]  1.259 -0.55  0.0051 -0.6119  1.412  1.0760 -1.824 -0.31 -0.41 -0.054
#  [2,] -0.049 -1.48  0.1907  0.8327  0.918  1.8429 -1.113 -0.43 -0.64 -0.071
#  [3,]  0.543 -0.49  1.3088  0.9670  0.011 -2.1048  0.081 -1.02  0.16  0.554
#  [4,]  0.336  0.95 -1.1044  1.1492 -0.462  1.6248 -1.391 -0.37 -0.72 -0.022
#  [5,]  0.604  2.16 -1.2403 -0.5734 -1.059 -0.5104  0.181 -0.25  0.80 -0.107
#  [6,] -0.426 -0.79  0.1849  1.1712  0.388 -0.1863 -0.792  0.96  1.32 -1.821
#  [7,]  2.137 -0.17 -0.8968  0.6015 -0.121 -0.3886 -0.639 -0.47 -1.13  1.078
#  [8,]  0.017 -1.49  1.4082  1.0414 -0.063 -0.0085 -1.729 -0.29  0.51  0.603
#  [9,]  1.828  0.19 -0.7497  0.6731  0.686 -0.0977 -1.584  0.44 -0.21 -1.176
# [10,] -0.078 -0.76  0.7643  0.8408  0.959  0.1352  0.206 -1.24  1.05 -1.874
# [11,]  1.416  0.23  0.0434 -1.8632  1.538 -0.4413  0.387 -0.46 -0.73 -0.120
# [12,]  2.145  0.65 -0.7603 -0.1039 -0.469  0.0837 -0.485 -1.49  0.77 -0.345
# [13,] -1.427  0.79  1.2895  0.4169  0.442 -0.5993 -0.154  0.92 -1.75  0.077
# [14,] -0.358 -0.68  0.5283 -1.0064  1.248 -0.5745  0.990 -0.35  1.53 -1.334
# [15,]  0.068  0.74  0.3022 -0.3631 -0.960 -1.5391  1.722 -0.28  1.12 -0.801
# [16,]  0.993 -1.54  0.6062  0.9338 -0.618 -0.1029 -0.872 -1.02  0.15  1.477
# [17,] -0.055 -0.73  1.2414  1.3609 -1.195 -0.3619  0.170  0.32 -1.62  0.871
# [18,] -1.765 -0.56  0.0652  0.3143 -0.967  1.8055  0.806 -0.53  0.43  0.396
# [19,] -1.057 -1.04 -1.4293 -0.0093  0.642 -0.3303  0.271  0.23  0.91  1.811
# [20,]  1.498 -0.33  0.0227 -1.9499  0.547 -0.1876 -0.458  1.45 -0.39 -0.200

where each value is the z-score of the original data on a per-row basis.

(mtx[1,1] - rowMu[1]) / rowSigma[1]
# [1] 1.26
(mtx[2,3] - rowMu[2]) / rowSigma[2]
# [1] 0.191
r2evans
  • 141,215
  • 6
  • 77
  • 149
1

We can use rowMeans with rowSds

library(matrixStats)
(mtx - rowMeans(mtx))/rowSds(mtx)

data

set.seed(42)
mtx <- matrix(rnorm(200), ncol=10)
akrun
  • 874,273
  • 37
  • 540
  • 662
1

I would use the scale() function that does excatly what you want.

mtx <- matrix(rnorm(100), ncol= 2)
mtx_z <- apply(mtx, 2, scale)