How do I standardize a matrix?

Question

Basically, take a matrix and change it so that its mean is equal to 0 and variance is 1. I'm using numpy's arrays so if it can already do it it's better, but I can implement it myself as long as I can find an algorithm.

edit: nvm nimrodm has a better implementation

Define "change". What if, say, we just replace the matrix with the identity matrix or something? What kinds of transformations are OK? — Karl Knechtel, Dec 28 '10 at 07:23
I'm trying to implement a computer vision algorithm that asks for this operation to be performed in the intermediate steps. I think it's because it's a requirement for PCA but I'm not sure. — pnodbnda, Dec 28 '10 at 07:46
Would you consider accepting the answer by @nimrodm so I can delete mine? — John Alexiou, Jul 31 '17 at 20:35

score 76 · Answer 1 · edited Aug 24 '20 at 11:43

76

The following subtracts the mean of A from each element (the new mean is 0), then normalizes the result by the standard deviation.

import numpy as np
A = (A - np.mean(A)) / np.std(A)

The above is for standardizing the entire matrix as a whole, If A has many dimensions and you want to standardize each column individually, specify the axis:

import numpy as np
A = (A - np.mean(A, axis=0)) / np.std(A, axis=0)

Always verify by hand what these one-liners are doing before integrating them into your code. A simple change in orientation or dimension can drastically change (silently) what operations numpy performs on them.

edited Aug 24 '20 at 11:43

Espoir Murhabazi

5,973
5
42
73

answered Dec 28 '10 at 07:41

nimrodm

23,081
7
58
59

2

you may want to update `A` only where `std(A) > 0` to avoid division by zero and `NaN` values – Ciprian Tomoiagă Dec 04 '16 at 01:18
Is this possible where A is represented as a list of lists? – Nematode7 Feb 14 '17 at 06:28
@Neamah Why not just [convert](http://stackoverflow.com/questions/10346336/list-of-lists-into-numpy-array) to a numpy array? – kingledion Apr 10 '17 at 12:31
Adding to @nimrodm's answer, this can be implemented in numpy as follows import numpy as np meanArr = np.mean(A) standardized_arr = (A-meanArr)/np.std(A) – user3585984 Mar 03 '19 at 03:05

score 13 · Answer 2 · edited Apr 17 '16 at 16:23

13

import scipy.stats as ss

A = np.array(ss.zscore(A))

edited Apr 17 '16 at 16:23

Tunaki

132,869
46
340
423

answered Apr 17 '16 at 16:09

AmanRaj

329
2
5

score 5 · Answer 3 · edited Dec 03 '16 at 20:36

from sklearn.preprocessing import StandardScaler

standardized_data = StandardScaler().fit_transform(your_data)

Example:

>>> import numpy as np
>>> from sklearn.preprocessing import StandardScaler

>>> data = np.random.randint(25, size=(4, 4))
>>> data
array([[17, 12,  4, 17],
       [ 1, 16, 19,  1],
       [ 7,  8, 10,  4],
       [22,  4,  2,  8]])

>>> standardized_data = StandardScaler().fit_transform(data)
>>> standardized_data
array([[ 0.63812398,  0.4472136 , -0.718646  ,  1.57786412],
       [-1.30663482,  1.34164079,  1.55076242, -1.07959124],
       [-0.57735027, -0.4472136 ,  0.18911737, -0.58131836],
       [ 1.24586111, -1.34164079, -1.02123379,  0.08304548]])

Works well on large datasets.

you could use ctrl+k to indent everything instead of backticks. — Jean-François Fabre, Dec 03 '16 at 19:07

score 2 · Answer 4 · answered Feb 07 '18 at 08:12

Use sklearn.preprocessing.scale.

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html

Here is an example.

>>> from sklearn import preprocessing
>>> import numpy as np
>>> X_train = np.array([[ 1., -1.,  2.],
...                     [ 2.,  0.,  0.],
...                     [ 0.,  1., -1.]])
>>> X_scaled = preprocessing.scale(X_train)
>>> X_scaled
array([[ 0.  ..., -1.22...,  1.33...],
       [ 1.22...,  0.  ..., -0.26...],
       [-1.22...,  1.22..., -1.06...]])

http://scikit-learn.org/stable/modules/preprocessing.html#standardization-or-mean-removal-and-variance-scaling

score 0 · Answer 5 · answered Aug 27 '19 at 23:17

0

import numpy as np

A = np.array([[1,2,6], [3000,1000,2000]]).T  

A_means = np.mean(A, axis=0)
A_centr = A - A_means
A_norms = np.linalg.norm(A_centr, axis=0)

A_std = A_centr / A_norms

answered Aug 27 '19 at 23:17

Alexander Drobyshevsky

3,907
2
20
17

How do I standardize a matrix?

5 Answers5

Linked