5

I'd like to normalize my training set before passing it to my NN so instead of doing it manually (subtract mean and divide by std), I tried keras.utils.normalize() and I am amazed about the results I got.

Running this:

r = np.random.rand(3000) * 1000
nr = normalize(r)
print(np.mean(r))
print(np.mean(nr))
print(np.std(r))
print(np.std(nr))
print(np.min(r))
print(np.min(nr))
print(np.max(r))
print(np.max(nr))

​ ​Results in that:

495.60440066771866
0.015737914577213984
291.4440194021
0.009254802974329002
0.20755517410064872
6.590913227674956e-06
999.7631481267636
0.03174747238214018

Unfortunately, the docs don't explain what's happening under the hood. Can you please explain what it does and if I should use keras.utils.normalize instead of what I would have done manually?

today
  • 32,602
  • 8
  • 95
  • 115
Yelve Yakut
  • 187
  • 2
  • 11

1 Answers1

7

It is not the kind of normalization you expect. Actually, it uses np.linalg.norm() under the hood to normalize the given data using Lp-norms:

def normalize(x, axis=-1, order=2):
    """Normalizes a Numpy array.
    # Arguments
        x: Numpy array to normalize.
        axis: axis along which to normalize.
        order: Normalization order (e.g. 2 for L2 norm).
    # Returns
        A normalized copy of the array.
    """
    l2 = np.atleast_1d(np.linalg.norm(x, order, axis))
    l2[l2 == 0] = 1
    return x / np.expand_dims(l2, axis)

For example, in the default case, it would normalize the data using L2-normalization (i.e. the sum of squared of elements would be equal to one).

You can either use this function, or if you don't want to do mean and std normalization manually, you can use StandardScaler() from sklearn or even MinMaxScaler().

today
  • 32,602
  • 8
  • 95
  • 115