How to normalize a 2-dimensional numpy array in python less verbose?

Question

Given a 3 times 3 numpy array

a = numpy.arange(0,27,3).reshape(3,3)

# array([[ 0,  3,  6],
#        [ 9, 12, 15],
#        [18, 21, 24]])

To normalize the rows of the 2-dimensional array I thought of

row_sums = a.sum(axis=1) # array([ 9, 36, 63])
new_matrix = numpy.zeros((3,3))
for i, (row, row_sum) in enumerate(zip(a, row_sums)):
    new_matrix[i,:] = row / row_sum

There must be a better way, isn't there?

Perhaps to clearify: By normalizing I mean, the sum of the entrys per row must be one. But I think that will be clear to most people.

Careful, "normalize" usually means the *square* sum of components is one. Your definition will hardly be clear to most people;) — coldfix, Jul 13 '15 at 18:10
@coldfix speaks about `L2` norm and considers it as most common (which may be true) while Aufwind uses `L1` norm which is also a norm indeed. — Bálint Sass, Feb 12 '21 at 09:50

score 173 · Accepted Answer · edited Jan 18 '12 at 04:27

173

Broadcasting is really good for this:

row_sums = a.sum(axis=1)
new_matrix = a / row_sums[:, numpy.newaxis]

row_sums[:, numpy.newaxis] reshapes row_sums from being (3,) to being (3, 1). When you do a / b, a and b are broadcast against each other.

You can learn more about broadcasting here or even better here.

edited Jan 18 '12 at 04:27

Daniel Fischer

181,706
17
308
431

answered Jan 18 '12 at 03:21

Bi Rico

25,283
3
52
75

41

This can be simplified even further using `a.sum(axis=1, keepdims=True)` to keep the singleton column dimension, which you can then broadcast along without having to use `np.newaxis`. – ali_m Apr 23 '15 at 13:26
9

what if any of the row_sums is zero? – asdf Apr 24 '15 at 23:31
@asdf ...well in that case normalizing by the row sum doesn't really make much sense! – ali_m Apr 25 '15 at 19:25
12

This is the correct answer for the question as stated above - but if a normalization in the usual sense is desired, use `np.linalg.norm` instead of `a.sum`! – coldfix Jul 13 '15 at 18:12
2

is this preferred to `row_sums.reshape(3,1)` ? – Paul Aug 10 '15 at 02:09
1

It's not as robust since the row sum may be 0. – nos Jun 08 '16 at 22:48
If a vector is normalized, it should have a unit norm, using a / row_sums[:, numpy.newaxis] really doesn't guarantee a unit norm. – XY.W Jan 12 '17 at 09:37
@XY.W There are many definitions of "unit norm", take a look at the ord argument to [numpy's norm function](https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.linalg.norm.html). Ord 1 norms are often useful and the OP asked specifically about normalizing with respect to this norm, but you can of course replace the denominator with the most appropriate norm for your application. – Bi Rico Jan 13 '17 at 20:29
Is this the same as MinMaxNorm or what is the name of this normalization? – Mona Jalal Sep 23 '17 at 00:15
This is equivalent to `new_matrix = a / row_sums[:, None]`, as `None` can be used as a shorthand for `np.newaxis`. – johannesack May 07 '21 at 12:02

score 133 · Answer 2 · edited Nov 21 '20 at 20:58

133

Scikit-learn offers a function normalize() that lets you apply various normalizations. The "make it sum to 1" is called L1-norm. Therefore:

from sklearn.preprocessing import normalize

matrix = numpy.arange(0,27,3).reshape(3,3).astype(numpy.float64)
# array([[  0.,   3.,   6.],
#        [  9.,  12.,  15.],
#        [ 18.,  21.,  24.]])

normed_matrix = normalize(matrix, axis=1, norm='l1')
# [[ 0.          0.33333333  0.66666667]
#  [ 0.25        0.33333333  0.41666667]
#  [ 0.28571429  0.33333333  0.38095238]]

Now your rows will sum to 1.

edited Nov 21 '20 at 20:58

normanius

8,629
7
53
83

answered Mar 20 '14 at 22:54

rogueleaderr

4,671
2
33
40

3

This also has the advantage that it works on sparse arrays that would not fit into memory as dense arrays. – JEM_Mosig Jan 29 '20 at 19:58

score 11 · Answer 3 · answered Jan 18 '12 at 03:22

11

I think this should work,

a = numpy.arange(0,27.,3).reshape(3,3)

a /=  a.sum(axis=1)[:,numpy.newaxis]

answered Jan 18 '12 at 03:22

tom10

67,082
10
127
137

2

good. note the change of dtype to arange, by appending decimal point to 27. – wim Jan 18 '12 at 03:36

walt · Answer 4 · 2014-05-10T20:33:37.537

6

In case you are trying to normalize each row such that its magnitude is one (i.e. a row's unit length is one or the sum of the square of each element in a row is one):

import numpy as np

a = np.arange(0,27,3).reshape(3,3)

result = a / np.linalg.norm(a, axis=-1)[:, np.newaxis]
# array([[ 0.        ,  0.4472136 ,  0.89442719],
#        [ 0.42426407,  0.56568542,  0.70710678],
#        [ 0.49153915,  0.57346234,  0.65538554]])

Verifying:

np.sum( result**2, axis=-1 )
# array([ 1.,  1.,  1.])

edited May 10 '14 at 20:33

answered May 10 '14 at 19:13

walt

71
1
3

Axis doesn't seem to be a parameter to np.linalg.norm (anymore?). – Ztyx May 25 '14 at 11:06
notably this corresponds to the l2 norm (where as rows summing to 1 corresponds to the l1 norm) – dpb Oct 28 '14 at 22:40

score 4 · Answer 5 · answered Oct 16 '18 at 04:45

4

I think you can normalize the row elements sum to 1 by this: new_matrix = a / a.sum(axis=1, keepdims=1). And the column normalization can be done with new_matrix = a / a.sum(axis=0, keepdims=1). Hope this can hep.

answered Oct 16 '18 at 04:45

Snoopy

138
6

score 2 · Answer 6 · answered Oct 31 '19 at 05:00

2

You could use built-in numpy function: np.linalg.norm(a, axis = 1, keepdims = True)

answered Oct 31 '19 at 05:00

Saurabh Gupta

29
1

This computes the norm and does not normalize the matrix – qwr Mar 30 '22 at 19:20

score 1 · Answer 7 · answered Nov 08 '15 at 15:13

1

it appears that this also works

def normalizeRows(M):
    row_sums = M.sum(axis=1)
    return M / row_sums

answered Nov 08 '15 at 15:13

Jamesszm

101
1
10

score 0 · Answer 8 · answered Feb 21 '17 at 11:20

0

You could also use matrix transposition:

(a.T / row_sums).T

answered Feb 21 '17 at 11:20

Maciek

762
6
17

this answer is incomplete without how you computed `row_sums` – qwr Mar 30 '22 at 19:19
It is in the original question: `row_sums = a.sum(axis=1)` – Maciek Apr 01 '22 at 07:08

score 0 · Answer 9 · answered Nov 07 '20 at 21:36

Here is one more possible way using reshape:

a_norm = (a/a.sum(axis=1).reshape(-1,1)).round(3)
print(a_norm)

Or using None works too:

a_norm = (a/a.sum(axis=1)[:,None]).round(3)
print(a_norm)

Output:

array([[0.   , 0.333, 0.667],
       [0.25 , 0.333, 0.417],
       [0.286, 0.333, 0.381]])

score 0 · Answer 10 · answered Jan 19 '23 at 23:01

0

Use

a = a / np.linalg.norm(a, ord = 2, axis = 0, keepdims = True)

Due to the broadcasting, it will work as intended.

answered Jan 19 '23 at 23:01

Moj

2,872
1
13
9

score -1 · Answer 11 · answered Jan 12 '17 at 09:31

-1

Or using lambda function, like

>>> vec = np.arange(0,27,3).reshape(3,3)
>>> import numpy as np
>>> norm_vec = map(lambda row: row/np.linalg.norm(row), vec)

each vector of vec will have a unit norm.

answered Jan 12 '17 at 09:31

XY.W

104
5

is this using python's `map`? won't builtin numpy functions be much faster? – qwr Mar 30 '22 at 19:21

score -1 · Answer 12 · answered Oct 13 '21 at 17:41

-1

We can achieve the same effect by premultiplying with the diagonal matrix whose main diagonal is the reciprocal of the row sums.

A = np.diag(A.sum(1)**-1) @ A

answered Oct 13 '21 at 17:41

kimegitee

11
1

too inefficient. you turned a simple sum over all elements into a big (sparse) matrix multiplication – qwr Mar 30 '22 at 19:22
@qwr The original poster did not ask for a more efficient version, only a less "verbose" one. – kimegitee Dec 05 '22 at 19:02

How to normalize a 2-dimensional numpy array in python less verbose?

12 Answers12

Linked

Related