Remove mean from numpy matrix

Question

I have a numpy matrix A where the data is organised column-vector-vise i.e A[:,0] is the first data vector, A[:,1] is the second and so on. I wanted to know whether there was a more elegant way to zero out the mean from this data. I am currently doing it via a for loop:

mean=A.mean(axis=1)
for k in range(A.shape[1]):
    A[:,k]=A[:,k]-mean

So does numpy provide a function to do this? Or can it be done more efficiently another way?

score 38 · Accepted Answer · edited May 23 '17 at 12:00

38

As is typical, you can do this a number of ways. Each of the approaches below works by adding a dimension to the mean vector, making it a 4 x 1 array, and then NumPy's broadcasting takes care of the rest. Each approach creates a view of mean, rather than a deep copy. The first approach (i.e., using newaxis) is likely preferred by most, but the other methods are included for the record.

In addition to the approaches below, see also ovgolovin's answer, which uses a NumPy matrix to avoid the need to reshape mean altogether.

For the methods below, we start with the following code and example array A.

import numpy as np

A = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
mean = A.mean(axis=1)

Using `numpy.newaxis`

>>> A - mean[:, np.newaxis]
array([[-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.]])

Using `None`

The documentation states that None can be used instead of newaxis. This is because

>>> np.newaxis is None
True

Therefore, the following accomplishes the task.

>>> A - mean[:, None]
array([[-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.]])

That said, newaxis is clearer and should be preferred. Also, a case can be made that newaxis is more future proof. See also: Numpy: Should I use newaxis or None?

Using `ndarray.reshape`

>>> A - mean.reshape((mean.shape[0]), 1)
array([[-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.]])

Changing `ndarray.shape` directly

You can alternatively change the shape of mean directly.

>>> mean.shape = (mean.shape[0], 1)
>>> A - mean
array([[-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.]])

edited May 23 '17 at 12:00

Community

1
1

answered Dec 07 '11 at 22:02

David Alber

17,624
6
65
71

2

The usual way to express this kind of reshape in NumPy is to use [`np.newaxis`](http://www.scipy.org/Numpy_Example_List#newaxis): `A - mean[:, np.newaxis]`. – Sven Marnach Dec 07 '11 at 23:02
@SvenMarnach Updated the answer to use `np.newaxis`. Thanks for your input. – David Alber Dec 08 '11 at 01:19
Note that `None` can also be used (i.e., `A - mean[:, None]`, see [documentation](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#numpy.newaxis)). This is because `numpy.newaxis` is `None`, but `np.newaxis` is clearer and is probably more future proof (also see http://stackoverflow.com/questions/944863/numpy-should-i-use-newaxis-or-none). – David Alber Dec 08 '11 at 01:21
This is one of the many reasons that numpy rocks. in Matlab, the command would be: bsxfun(@minus, A, mean(A, 2)). I think "A - mean(A, axis=1)[:, np.newaxis]" is a lot easier to read and remember. Also, note that np.newaxis is None – Carl F. Dec 08 '11 at 02:29
1

Another way is to use the `keepdims=True` argument to `.mean()`. Default behavior for `.mean()` is to remove the dimension that you mean over (given by `axis` argument). `keepdims=True` stops it from doing that. `>>> import numpy as np` `A = np.array([[1, 2, 3], [4, 5, 6]])` `mean = A.mean(axis=1, keepdims=True)` `A = A - mean` – rbgb Jun 17 '17 at 01:32
I don't know why you'd ever want to call a variable the name of a common function `mean` – jeffery_the_wind Nov 01 '19 at 15:10

score 11 · Answer 2 · answered Dec 07 '11 at 22:13

11

You can also use matrix instead of array. Then you won't need to reshape:

>>> A = np.matrix([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
>>> m = A.mean(axis=1)
>>> A - m
matrix([[-1.,  0.,  1.],
        [-1.,  0.,  1.],
        [-1.,  0.,  1.],
        [-1.,  0.,  1.]])

answered Dec 07 '11 at 22:13

ovgolovin

13,063
6
47
78

1

I didn't know matrices did that. +1. – Carl F. Dec 08 '11 at 02:30

score 5 · Answer 3 · answered Jun 01 '18 at 20:35

Looks like some of these answers are pretty old, I just tested this on numpy 1.13.3:

>>> import numpy as np
>>> a = np.array([[1,1,3],[1,0,4],[1,2,2]])
>>> a
array([[1, 1, 3],
       [1, 0, 4],
       [1, 2, 2]])
>>> a = a - a.mean(axis=0)
>>> a
array([[ 0.,  0.,  0.],
       [ 0., -1.,  1.],
       [ 0.,  1., -1.]])

I think this is much cleaner and simpler. Have a try and let me know if this is somehow inferior than the other answers.

score 5 · Answer 4 · answered Dec 08 '11 at 08:07

Yes. pylab.demean:

In [1]: X = scipy.rand(2,3)

In [2]: X.mean(axis=1)
Out[2]: array([ 0.42654669,  0.65216704])

In [3]: Y = pylab.demean(X, axis=1)

In [4]: Y.mean(axis=1)
Out[4]: array([  1.85037171e-17,   0.00000000e+00])

Source:

In [5]: pylab.demean??
Type:           function
Base Class:     <type 'function'>
String Form:    <function demean at 0x38492a8>
Namespace:      Interactive
File:           /usr/lib/pymodules/python2.7/matplotlib/mlab.py
Definition:     pylab.demean(x, axis=0)
Source:
def demean(x, axis=0):
    "Return x minus its mean along the specified axis"
    x = np.asarray(x)
    if axis == 0 or axis is None or x.ndim <= 1:
        return x - x.mean(axis)
    ind = [slice(None)] * x.ndim
    ind[axis] = np.newaxis
    return x - x.mean(axis)[ind]

Steve, could you please also add the modules that you imported? — pratikm, Dec 10 '11 at 02:22

Remove mean from numpy matrix

4 Answers4

Using `numpy.newaxis`

Using `None`

Using `ndarray.reshape`

Changing `ndarray.shape` directly

Linked

Remove mean from numpy matrix

4 Answers4

Using numpy.newaxis

Using None

Using ndarray.reshape

Changing ndarray.shape directly

Linked

Using `numpy.newaxis`

Using `None`

Using `ndarray.reshape`

Changing `ndarray.shape` directly