Subtract the mean of rows from an array

Question

I'm trying to subtract the column average of an array from the respective column of the array using slicing and broadcasting. I don't understand how to transpose or why I need to, right now I have the given array Y.

    Y_avg = Y.mean(axis=0)
    Z = (Y.T - Y_avg).T

This is supposed to create an array that now has a column-wise average of 0. But that's not what I am getting

This works just fine for me. The reason you need to transpose is because of how numpy internally broadcasts array shapes. If you tried to do `Y - Y_avg` directly, it would not have the correct shapes to perform the operation since `.mean()` on an axis effectively drops a dimension. Then once you perform the operation on a transposed `Y`, you transpose the result back to the original shape of `Y`. — Philip Ciunkiewicz, Jul 01 '20 at 21:58

score 0 · Answer 1 · answered Jul 01 '20 at 21:55

And what are you getting? Initializing an array, performing an average with axis=0 (because this is a 1D array), works as intended.

import numpy as np

Y = np.array([1,2,3])
Y_avg = Y.mean(axis=0)
print Y - Y_avg

This outputs [-1. 0. 1.] as expected.

Hans Musgrave · Answer 2 · 2020-07-01T22:12:05.807

What you're seeing is that taking the mean along an axis drops a dimension, moving the data from shape (n, k) to shape (n,). This isn't compatible with (n, k) for broadcasting a subtraction. Plenty has been written on that, e.g. here https://stackoverflow.com/a/24564015/3798897

Instead of multiple transposes it might be more convenient to reshape the averages so that they're broadcastable:

# Transform the single-dimension mean into a 2D column vector
Y_avg = Y.mean(axis=1).reshape(-1, 1)
Z = Y - Y_avg

Subtract the mean of rows from an array

2 Answers2