Dynamically normalise 2D numpy array

Question

I have a 2D numpy array "signals" of shape (100000, 1024). Each row contains the traces of amplitude of a signal, which I want to normalise to be within 0-1.

The signals each have different amplitudes, so I can't just divide by one common factor, so I was wondering if there's a way to normalise each of the signals so that each value within them is between 0-1?

Let's say that the signals look something like [[0,1,2,3,5,8,2,1],[0,2,5,10,7,4,2,1]] and I want them to become [[0.125,0.25,0.375,0.625,1,0.25,0.125],[0,0.2,0.5,0.7,0.4,0.2,0.1]].

Is there a way to do it without looping over all 100,000 signals, as this will surely be slow?

Thanks!

python scikit learn library has a normalize function. you can try that. — MUK, Jul 08 '20 at 10:50

score 4 · Accepted Answer · answered Jul 08 '20 at 10:55

4

Easy thing to do would be to generate a new numpy array with max values by axis and divide by it:

import numpy as np

a = np.array([[0,1,2,3,5,8,2,1],[0,2,5,10,7,4,2,1]])

b = np.max(a, axis = 1)

print(a / b[:,np.newaxis])

output:

[[0.    0.125 0.25  0.375 0.625 1.    0.25  0.125]
 [0.    0.2   0.5   1.    0.7   0.4   0.2   0.1  ]]

answered Jul 08 '20 at 10:55

Roland Deschain

2,211
19
50

This is great - the only issue (which I should have said before!) is that some of the "signals" have no signals in them, and are therefore arrays of 0s. Is there a clever way to avoid attempting to divide by 0? – Beth Long Jul 08 '20 at 11:01
1

Nice answer. The original poster might find some relevant information in this related post https://stackoverflow.com/questions/19602187/numpy-divide-each-row-by-a-vector-element too. Best regards. – smile Jul 08 '20 at 11:01
1

@BethLong you could just use numpy.nan_to_numb() on the resulting array. This will get you zeros for the nans you will get from the division by zero. – Roland Deschain Jul 08 '20 at 11:05
1

Alternatively, check the documentation here https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.divide.html that gives information on how to handle the division by zero. Especially, using seterr in the last part of the link. Best regards – smile Jul 08 '20 at 11:06
Fantastic, thank you very much to both of you. I'll go with the nan_to_numb option since it's straight forward, but I'll check out that other link as well. Much appreciated! – Beth Long Jul 08 '20 at 11:07

score 4 · Answer 2 · answered Jul 08 '20 at 11:07

Adding a little benchmark to show just how significant is the performance difference between the two solutions:

import numpy as np
import timeit

arr = np.arange(1024).reshape(128,8)

def using_list_comp():
    return np.array([s/np.max(s) for s in arr])

def using_vectorized_max_div():
    return arr/arr.max(axis=1)[:, np.newaxis]

result1 = using_list_comp()
result2 = using_vectorized_max_div()

print("Results equal:", (result1==result2).all())

time1 = timeit.timeit('using_list_comp()', globals=globals(), number=1000)
time2 = timeit.timeit('using_vectorized_max_div()', globals=globals(), number=1000)

print(time1)
print(time2)
print(time1/time2)

On my machine the output is:

Results equal: True
0.9873569
0.010177099999999939
97.01750989967731

Almost a 100x difference!

This was exactly what I expected to happen! Thanks for the comment! — Beth Long, Jul 08 '20 at 11:09

score 3 · Answer 3 · answered Jul 08 '20 at 11:10

3

Another solution is to use normalize:

from sklearn.preprocessing import normalize
data = [[0,1,2,3,5,8,2,1],[0,2,5,10,7,4,2,1]]
normalize(data, axis=1, norm='max')

result:

array([[0.   , 0.125, 0.25 , 0.375, 0.625, 1.   , 0.25 , 0.125],
       [0.   , 0.2  , 0.5  , 1.   , 0.7  , 0.4  , 0.2  , 0.1  ]])

Please note norm='max' argument. Default value is 'l2'.

answered Jul 08 '20 at 11:10

ipj

3,488
1
14
18

This is very useful, but I tested it with the script that Adam.Er8 posted above and it seems to take ~ 6 times longer than the vector devision method. Thanks for the comment though! – Beth Long Jul 08 '20 at 11:32
2

I've deleted my previous answer using list comprehension as loop based solution. Vectorized way is the fastest indeed. – ipj Jul 08 '20 at 11:39

Dynamically normalise 2D numpy array

3 Answers3

Linked