0

So I want to implement a matrix standardisation method. To do that, I've been told to

subtract the mean and divide by the standard deviation for each dimension

And to verify:

after this processing, each dimension has zero mean and unit variance.

That sounds simple enough ...

import numpy as np
def standardize(X : np.ndarray,inplace=True,verbose=False,check=False):

    ret = X
    if not inplace:
        ret = X.copy()

    ndim = np.ndim(X)

    for d in range(ndim):
        m = np.mean(ret,axis=d)
        s = np.std(ret,axis=d)

        if verbose:
            print(f"m{d} =",m)
            print(f"s{d} =",s)

        # TODO: handle zero s
        # TODO: subtract m along the correct axis
        # TODO: divide by s along the correct axis

    if check:    
        means = [np.mean(X,axis=d) for d in range(ndim)]
        stds  = [np.std(X,axis=d)  for d in range(ndim)]
        if verbose:
            print("means=\n",means)
            print("stds=\n",stds)

        assert all(all(m < 1e-15 for m in mm) for mm in means)
        assert all(all(s == 1.0 for s in ss) for ss in stds)

    return ret

e.g. for ndim == 2, we could get something like

A=
 [[ 0.40923704  0.91397416  0.62257397]
  [ 0.15614258  0.56720836  0.80624135]]
m0 = [ 0.28268981  0.74059126  0.71440766]  # can broadcast with ret -= m0
s0 = [ 0.12654723  0.1733829   0.09183369]  # can broadcast with ret /= s0
m1 = [ 0.33333333 -0.33333333]  # ???
s1 = [ 0.94280904  0.94280904]  # ???

How do I do that?

Judging by Broadcast an operation along specific axis in python , I thought I may be looking for a way to create

m[None, None, None, .., None, : , None, None, .., None]

Where there is exactly one : at index d.

But even if I knew how to do that, I'm not sure it'd work.

Community
  • 1
  • 1
User1291
  • 7,664
  • 8
  • 51
  • 108
  • Put that `np.append` down carefully, and back away. It's dangerous. http://stackoverflow.com/questions/42563335/how-to-append-a-selection-of-a-numpy-array-to-an-empty-numpy-array – hpaulj Mar 03 '17 at 08:45
  • @hpaulj didn't work, anyway. =) Also tried to "hack" the 2D-case and found that the index of the ``:`` doesn't meet my expectations. – User1291 Mar 03 '17 at 08:53
  • 1
    Use `keepdims` and thus avoid all that explicit dim-expanding work? – Divakar Mar 03 '17 at 09:05
  • @Divakar care to elaborate on that? – User1291 Mar 03 '17 at 13:30

1 Answers1

1

You can swap your axes such that the first axes is the one you want to normalize. This should also work inplace, since swapaxes just returns a view on your data.

Using the numpy command swapaxes:

for d in range(ndim):

    m = np.mean(ret,axis=d)
    s = np.std(ret,axis=d)

    ret = np.swapaxes(ret, 0, d)

    # Perform Normalisation of Axis
    ret -= m
    ret /= s

    ret = np.swapaxes(ret, 0, d)
  • So it just re-assigns axis labels without moving any data? Because although this method does get the means sufficiently low, I don't exactly get unit variances. – User1291 Mar 03 '17 at 11:55
  • Yes, it does not change the data. But you are right, this approach also fails with more than 2 dimensions somehow. – Stefan T. Huber Mar 03 '17 at 12:54
  • Well, turns out the reason *why* this is not working is because I'm doing the wrong thing. They did write "for each dimension", but what they actually meant was "for each column" so according to them, a two dimensional ``m by n`` matrix has ``n`` "dimensions". -.- A simple broadcast would've sufficed. – User1291 Mar 03 '17 at 14:56