0

I am do code a function which takes a NumPy array and does the normalization. I have written what's below:

def normalize_min_max(A, axis = None):
    ptr = axis
    minimum = np.amin(A, ptr)
    maximum = np.amax(A, ptr)
    for x in np.nditer(A):
        x = (x - minimum)/(maximum - minimum)
    return A

Unfortunately it doesn't work, because the returned array is unchanged. How can I repair it?

Hendrra
  • 682
  • 1
  • 8
  • 19

4 Answers4

2

In any Python iteration,

for x in np.nditer(A):
    x = (x - minimum)/(maximum - minimum)

assigning a value to the iteration variable changes its reference, and does not modify the original list/array.

I tried

for x in np.nditer(A):
    x[:] = (x - minimum)/(maximum - minimum)

but got an error

ValueError: assignment destination is read-only

I have to go to the nditer documentation, https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.nditer.html#arrays-nditer to find out how to change it to a read/write variable

In [388]: def normalize_min_max(A, axis = None):
     ...:     ptr = axis
     ...:     minimum = np.amin(A, ptr)
     ...:     maximum = np.amax(A, ptr)
     ...:     for x in np.nditer(A, op_flags=['readwrite']):
     ...:         x[...] = (x - minimum)/(maximum - minimum)
     ...:     return A
     ...: 
     ...: 
In [389]: normalize_min_max(np.arange(10))
Out[389]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1])

Oops, the original array is integer

In [390]: normalize_min_max(np.arange(10.))
Out[390]: 
array([ 0.        ,  0.11111111,  0.22222222,  0.33333333,  0.44444444,
        0.55555556,  0.66666667,  0.77777778,  0.88888889,  1.        ])

But I don't need to iterate to perform this kind of calculation:

In [391]: def normalize_min_max1(A, axis = None):
     ...:     ptr = axis
     ...:     minimum = np.amin(A, ptr, keepdims=True)
     ...:     maximum = np.amax(A, ptr, keepdims=True)
     ...:     return (A-minimum)/(maximum-minimum)

In [392]: normalize_min_max1(np.arange(10.))
Out[392]: 
array([ 0.        ,  0.11111111,  0.22222222,  0.33333333,  0.44444444,
        0.55555556,  0.66666667,  0.77777778,  0.88888889,  1.        ])

nditer does work in this context because the iteration variable is modifiable, where as it isn't with for x in A: .... But otherwise it's a complex iterator, and does not offer any speed advantages. As shown on the nditer tutorial page, it is most useful as a stepping stone to using nditer in cython.

Also your nditer code does not work with axis values. Mine, with the keep_dims parameter works:

In [396]: normalize_min_max1(np.arange(10.).reshape(5,2),0)
Out[396]: 
array([[ 0.  ,  0.  ],
       [ 0.25,  0.25],
       [ 0.5 ,  0.5 ],
       [ 0.75,  0.75],
       [ 1.  ,  1.  ]])
In [397]: normalize_min_max1(np.arange(10.).reshape(5,2),1)
Out[397]: 
array([[ 0.,  1.],
       [ 0.,  1.],
       [ 0.,  1.],
       [ 0.,  1.],
       [ 0.,  1.]])
In [398]: normalize_min_max1(np.arange(10.).reshape(5,2),None)
Out[398]: 
array([[ 0.        ,  0.11111111],
       [ 0.22222222,  0.33333333],
       [ 0.44444444,  0.55555556],
       [ 0.66666667,  0.77777778],
       [ 0.88888889,  1.        ]])

The nditer code with an axis value:

In [395]: normalize_min_max(np.arange(10.).reshape(5,2),0)
...
ValueError: could not broadcast input array from shape (2) into shape ()

The nditer variable is a 0d array, which allows it to be modified. But that complicates using it with the min/max values which may be arrays. We'd have to include those arrays in the nditer setup. So it's possible, but normally not worth the extra work.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
1
return [(x - minimum)/(maximum - minimum) for x in np.nditer(A)]

Alternately, for in-place array normalization, see this answer.

andrew_reece
  • 20,390
  • 3
  • 33
  • 58
1

Why the for loop? Here is a vectorized solution with some axis trickery to ensure the shapes align with the input axis:

def normalize_min_max(A, axis=None):
    A = np.asarray(A)
    A_min = A.min(axis=axis)
    A = (np.rollaxis(A, (0 if axis is None else axis)) - A_min) / (A.max(axis=axis) - A_min)
    return np.rollaxis(A, (0 if axis is None else axis))

Some results:

In[175]: a = np.arange(4*3, dtype='float32').reshape(4, 3)
In[176]: a
Out[176]: 

array([[  0.,   1.,   2.],
       [  3.,   4.,   5.],
       [  6.,   7.,   8.],
       [  9.,  10.,  11.]], dtype=float32)
In[177]: normalize_min_max(a, None)
Out[177]: 

array([[ 0.        ,  0.09090909,  0.18181819],
       [ 0.27272728,  0.36363637,  0.45454547],
       [ 0.54545456,  0.63636363,  0.72727275],
       [ 0.81818181,  0.90909094,  1.        ]], dtype=float32)
In[178]: normalize_min_max(a, 0)
Out[178]: 

array([[ 0.        ,  0.        ,  0.        ],
       [ 0.33333334,  0.33333334,  0.33333334],
       [ 0.66666669,  0.66666669,  0.66666669],
       [ 1.        ,  1.        ,  1.        ]], dtype=float32)
In[179]: normalize_min_max(a, 1)
Out[179]: 

array([[ 0. ,  0.5,  1. ],
       [ 0. ,  0.5,  1. ],
       [ 0. ,  0.5,  1. ],
       [ 0. ,  0.5,  1. ]], dtype=float32)
Devin Cairns
  • 650
  • 6
  • 9
  • That's pretty much what I would like to do with the axis. But unfortunately it does not work. There is an error: operands could not be broadcast together with shapes (2,3) (2,) – Hendrra Oct 22 '17 at 21:41
  • Is your goal to perform the normalization on each axis separately? – Devin Cairns Oct 22 '17 at 21:44
  • Yes. I would like to add a parameter (axis = None, 1 or -1). And the function should perform the normalization on the array, rows or columns. – Hendrra Oct 22 '17 at 21:45
  • It works with columns pretty good (axis = 0 I was wrong in the post above). But rows are not working. – Hendrra Oct 22 '17 at 22:01
  • Sorry, it works with rows too, but unfortunately there are some negative numbers. I don't know how to deal with them – Hendrra Oct 22 '17 at 22:03
  • I've made an edit that will hopefully result in your desired outcome – Devin Cairns Oct 22 '17 at 22:26
1

One method: in-place modification without creating a new Numpy array

import numpy as np

def normalize_min_max(A, axis = None):
    ptr = axis
    minimum = np.amin(A, ptr)
    maximum = np.amax(A, ptr)
    A = (A - minimum)/(maximum - minimum)
    return A

np_array = np.array([[1,2, 3,4],[2,3,4,5]]) # example input 

print(normalize_min_max(np_array))

Output:

[[ 0.    0.25  0.5   0.75]
 [ 0.25  0.5   0.75  1.  ]]

**The second method (your style): Create new Numpy array with the same shape as your input array and store your normalized values there **

import numpy as np

def normalize_min_max(A, axis = None):
    ptr = axis
    norm_A = np.empty(A.shape)
    minimum = np.amin(A, ptr)
    maximum = np.amax(A, ptr)
    delta = maximum - minimum
    for indx, x in np.ndenumerate(A):
        norm_A[indx] = (x - minimum)/delta
    return norm_A

np_array = np.array([[1,2, 3,4], [2,3,4,5]])

print(normalize_min_max(np_array))

Output:

[[ 0.    0.25  0.5   0.75]
 [ 0.25  0.5   0.75  1.  ]]

NOTE: I am assuming that you are only interested in min/max of all the elements of your Numpy array, that's why your default axis is None. Other values of axis won't work with ndenumerate with axis other than None as explained by @hpaulj for nditer. If you want to use other axes, I suggest using method 1 above.

utengr
  • 3,225
  • 3
  • 29
  • 68