How to normalize a numpy array to a unit vector

Question

I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function:

def normalize(v):
    norm = np.linalg.norm(v)
    if norm == 0: 
       return v
    return v / norm

This function handles the situation where vector v has the norm value of 0.

Is there any similar functions provided in sklearn or numpy?

If this is really a concern, you should check for norm < epsilon, where epsilon is a small tolerance. In addition, I wouldn't silently pass back a norm zero vector, I would `raise` an exception! — Hooked, Jan 09 '14 at 20:51
my function works but I would like to know if there is something inside the python's more common library. I am writing different machine learning functions and I would like to avoid to define too much new functions to make the code more clear and readable — Donbeo, Jan 09 '14 at 21:08
I did a few quick tests and I found that `x/np.linalg.norm(x)` was not much slower (about 15-20%) than `x/np.sqrt((x**2).sum())` in numpy 1.15.1 on a CPU. — Bill, Sep 10 '18 at 19:10

ali_m · Accepted Answer · 2014-01-09T21:27:58.520

237

If you're using scikit-learn you can use sklearn.preprocessing.normalize:

import numpy as np
from sklearn.preprocessing import normalize

x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = normalize(x[:,np.newaxis], axis=0).ravel()
print np.all(norm1 == norm2)
# True

edited Jan 09 '14 at 21:27

answered Jan 09 '14 at 21:15

ali_m

71,714
23
223
298

4

Thanks for the answer but are you sure that sklearn.preprocessing.normalize works also with vector of shape=(n,) or (n,1) ? I am having some problems with this library – Donbeo Jan 09 '14 at 21:17
`normalize` requires a 2D input. You can pass the `axis=` argument to specify whether you want to apply the normalization across the rows or columns of your input array. – ali_m Jan 09 '14 at 21:20
12

Note that the 'norm' argument of the normalize function can be either 'l1' or 'l2' and the default is 'l2'. If you want your vector's sum to be 1 (e.g. a probability distribution) you should use norm='l1' in the normalize function. – Ash Nov 06 '15 at 10:56
6

Also note that `np.linalg.norm(x)` calculates 'l2' norm by default. If you want your vector's sum to be 1 you should use `np.linalg.norm(x, ord=1)` – Omid Jul 28 '18 at 17:21
Note: x must be `ndarray` for it to work with the `normalize()` function. Otherwise it can be a `list`. – Ramin Melikov Apr 27 '20 at 04:17
NOTE: calculation for norm1 is much faster. `%%timeit` for norm1 calculation gives: `4.74 µs ± 24.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)`, whereas for norm2 it is: `34.2 µs ± 473 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)` – hafiz031 Dec 26 '22 at 14:00

score 66 · Answer 2 · edited Nov 04 '21 at 19:09

66

I agree that it would be nice if such a function were part of the included libraries. But it isn't, as far as I know. So here is a version for arbitrary axes that gives optimal performance.

import numpy as np

def normalized(a, axis=-1, order=2):
    l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
    l2[l2==0] = 1
    return a / np.expand_dims(l2, axis)

A = np.random.randn(3,3,3)
print(normalized(A,0))
print(normalized(A,1))
print(normalized(A,2))

print(normalized(np.arange(3)[:,None]))
print(normalized(np.arange(3)))

edited Nov 04 '21 at 19:09

Sandell0

112
1
3
8

answered Jan 09 '14 at 21:59

Eelco Hoogendoorn

10,459
1
44
42

I did not deeply test the ali_m solution but in some simple case it seems to be working. Are there situtions where your function does better? – Donbeo Jan 09 '14 at 23:20
2

I don't know; but it works over arbitrary axes, and we have explicit control over what happens for length 0 vectors. – Eelco Hoogendoorn Jan 10 '14 at 06:52
3

Very nice! This should be in numpy — although order should probably come before axis in my opinion. – Neil G Jan 16 '15 at 15:57
1

@EelcoHoogendoorn Curious to understand why order=2 chosen over others? – Henry Thornton Jul 05 '15 at 07:35
12

Because the Euclidian/pythagoran norm happens to be the most frequently used one; wouldn't you agree? – Eelco Hoogendoorn Jul 06 '15 at 08:47
2

Pretty late, but I think it's worth mentioning that this is exactly why it is discouraged to use lowercase 'L' as a variable name... in my typeface 'l2' is indistinguishable from '12' – bendl Jun 13 '17 at 17:45
normalizing `[inf, 1, 2]` yields `[nan, 0, 0]`, but shouldn't it be `[1, 0, 0]`? – pasbi Mar 09 '18 at 16:31
1

If you'd like to endow the fp-inf symbol with such semantics, sure, but thatd be kinda nonstandard. The fp standard is full of quirks anyway but I think having such a function do anything but standard fp logic by default would just be confusing. – Eelco Hoogendoorn Mar 09 '18 at 17:18
Shouldn't the normalized array sum up to 1 (at least I would expect it to do)? Just tested this implementation with `[5,5]` what yields `[0.70710678, 0.70710678]` what in sum is about `1.41`. Doesn't sound right to me. – Spenhouet Jun 21 '18 at 16:01
Look up the concept of the order of a norm. What you want is the 1-norm, which you can get by setting the order kwarg to 1. – Eelco Hoogendoorn Jun 21 '18 at 21:36
2

@bendl I think that's exactly why it's encouraged to use a better typeface – anon01 Mar 25 '21 at 16:32
Regarding choice of variable name `l2`, most python users are not software engineers. They are mathematicians and scientists first, and their code tends to reflect that culture, where using single letters for variables is customary. I agree though. Using lowercase `l` or uppercase `O` as a variable names should *definitely* be avoided. Trust me, I fix software bugs for a living. – John Henckel Jan 03 '22 at 15:30
1

@pasbi then what should `[inf, 3, inf]` yield? `[1, 0, 1]` or `[0.5, 0, 0.5]` or something else? I'd say `[nan, 0, nan]` would be the right output so the user can then fill the `nan` values with your chosen filler to force the expectation. – scign May 29 '22 at 17:46
1

@scign I guess there's no right or wrong here, just more or less useful. I agree that your proposal is probably more useful in general. It's also more IEEE 754-conform (as `∞/∞=nan`). I don't remember my use case from four years ago, I presume it was something very special. – pasbi May 30 '22 at 08:18

mrk · Answer 3 · 2022-10-28T06:57:17.537

40

This might also work for you

import numpy as np
normalized_v = v / np.sqrt(np.sum(v**2))

but fails when v has length 0.

In that case, introducing a small constant to prevent the zero division solves this.

As proposed in the comments one could also use

v/np.linalg.norm(v)

edited Oct 28 '22 at 06:57

answered Jul 25 '18 at 07:17

mrk

8,059
3
56
78

3

Or `v/np.linalg.norm(v)` – testing_22 Oct 27 '22 at 04:20
3

added your suggestion to the answer. Thanks for the contribution @testing_22 – mrk Oct 28 '22 at 06:57

Eduard Feicho · Answer 4 · 2022-05-02T15:44:50.943

29

To avoid zero division I use eps, but that's maybe not great.

def normalize(v):
    norm=np.linalg.norm(v)
    if norm==0:
        norm=np.finfo(v.dtype).eps
    return v/norm

edited May 02 '22 at 15:44

answered Nov 01 '16 at 12:49

Eduard Feicho

568
4
8

9

normalizing `[inf, 1, 2]` yields `[nan, 0, 0]`, but shouldn't it be `[1, 0, 0]`? – pasbi Mar 09 '18 at 16:30
1

Some time has passed but the answer is no, `[nan, 0, 0]` is correct since the norm is `inf` and `inf/inf` is an indeterminate form because `/inf` is `0` but is also true that `inf/` is `inf`, so `inf/inf` cannot be determined. – Alessandro Muzzi Sep 24 '21 at 14:12
1

Is there a reason for you to use the L1-norm? The OP seems to ask for L2-normalization. – NerdOnTour Jan 20 '22 at 13:41
hm yeah should have been l2 norm – Eduard Feicho May 02 '22 at 15:43

Jaden Travnik · Answer 5 · 2019-09-03T17:02:20.143

12

If you have multidimensional data and want each axis normalized to its max or its sum:

def normalize(_d, to_sum=True, copy=True):
    # d is a (n x dimension) np array
    d = _d if not copy else np.copy(_d)
    d -= np.min(d, axis=0)
    d /= (np.sum(d, axis=0) if to_sum else np.ptp(d, axis=0))
    return d

Uses numpys peak to peak function.

a = np.random.random((5, 3))

b = normalize(a, copy=False)
b.sum(axis=0) # array([1., 1., 1.]), the rows sum to 1

c = normalize(a, to_sum=False, copy=False)
c.max(axis=0) # array([1., 1., 1.]), the max of each row is 1

edited Sep 03 '19 at 17:02

answered May 08 '18 at 22:46

Jaden Travnik

1,107
13
27

1

Watch out if all values are the same in the original matrix, then ptp would be 0. Division by 0 will return nan. – Mcmil Mar 10 '20 at 13:34

score 12 · Answer 6 · answered May 24 '19 at 01:02

12

If you don't need utmost precision, your function can be reduced to:

v_norm = v / (np.linalg.norm(v) + 1e-16)

answered May 24 '19 at 01:02

sergio verduzco

351
2
6

score 10 · Answer 7 · edited Dec 09 '21 at 23:54

10

There is also the function unit_vector() to normalize vectors in the popular transformations module by Christoph Gohlke:

import transformations as trafo
import numpy as np

data = np.array([[1.0, 1.0, 0.0],
                 [1.0, 1.0, 1.0],
                 [1.0, 2.0, 3.0]])

print(trafo.unit_vector(data, axis=1))

edited Dec 09 '21 at 23:54

Kraigolas

5,121
3
12
37

answered Apr 17 '18 at 08:39

Joe

6,758
2
26
47

score 10 · Answer 8 · edited Jan 30 '20 at 15:55

You mentioned sci-kit learn, so I want to share another solution.

sci-kit learn `MinMaxScaler`

In sci-kit learn, there is a API called MinMaxScaler which can customize the the value range as you like.

It also deal with NaN issues for us.

NaNs are treated as missing values: disregarded in fit, and maintained in transform. ... see reference [1]

Code sample

The code is simple, just type

# Let's say X_train is your input dataframe
from sklearn.preprocessing import MinMaxScaler
# call MinMaxScaler object
min_max_scaler = MinMaxScaler()
# feed in a numpy array
X_train_norm = min_max_scaler.fit_transform(X_train.values)
# wrap it up if you need a dataframe
df = pd.DataFrame(X_train_norm)

Reference

[1] sklearn.preprocessing.MinMaxScaler

This does a different type of transform. The OP wanted to scale the magnitude of the vector so that each vector has a length of 1; MinMaxScaler individually scales each column independently to be within a certain range. — crypdick, Dec 08 '20 at 14:32

score 8 · Answer 9 · answered Oct 17 '19 at 22:15

If you work with multidimensional array following fast solution is possible.

Say we have 2D array, which we want to normalize by last axis, while some rows have zero norm.

import numpy as np
arr = np.array([
    [1, 2, 3], 
    [0, 0, 0],
    [5, 6, 7]
], dtype=np.float)

lengths = np.linalg.norm(arr, axis=-1)
print(lengths)  # [ 3.74165739  0.         10.48808848]
arr[lengths > 0] = arr[lengths > 0] / lengths[lengths > 0][:, np.newaxis]
print(arr)
# [[0.26726124 0.53452248 0.80178373]
# [0.         0.         0.        ]
# [0.47673129 0.57207755 0.66742381]]

score 7 · Answer 10 · answered Aug 31 '18 at 07:01

If you want to normalize n dimensional feature vectors stored in a 3D tensor, you could also use PyTorch:

import numpy as np
from torch import FloatTensor
from torch.nn.functional import normalize

vecs = np.random.rand(3, 16, 16, 16)
norm_vecs = normalize(FloatTensor(vecs), dim=0, eps=1e-16).numpy()

paulmelnikow · Answer 11 · 2019-01-31T21:27:18.260

If you're working with 3D vectors, you can do this concisely using the toolbelt vg. It's a light layer on top of numpy and it supports single values and stacked vectors.

import numpy as np
import vg

x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = vg.normalize(x)
print np.all(norm1 == norm2)
# True

I created the library at my last startup, where it was motivated by uses like this: simple ideas which are way too verbose in NumPy.

seralouk · Answer 12 · 2019-06-26T20:21:09.790

7

Without sklearn and using just numpy. Just define a function:.

Assuming that the rows are the variables and the columns the samples (axis= 1):

import numpy as np

# Example array
X = np.array([[1,2,3],[4,5,6]])

def stdmtx(X):
    means = X.mean(axis =1)
    stds = X.std(axis= 1, ddof=1)
    X= X - means[:, np.newaxis]
    X= X / stds[:, np.newaxis]
    return np.nan_to_num(X)

output:

X
array([[1, 2, 3],
       [4, 5, 6]])

stdmtx(X)
array([[-1.,  0.,  1.],
       [-1.,  0.,  1.]])

edited Jun 26 '19 at 20:21

answered Jun 26 '19 at 17:27

seralouk

30,938
9
118
133

These output arrays do not have unit norm. Subtracting the mean and giving the samples unit variance does not produce unit vectors. – crypdick Dec 08 '20 at 14:52

score 3 · Answer 13 · answered Apr 04 '22 at 17:55

3

For a 2D array, you can use the following one-liner to normalize across rows. To normalize across columns, simply set axis=0.

a / np.linalg.norm(a, axis=1, keepdims=True)

answered Apr 04 '22 at 17:55

Cristian Arteaga

450
7
11

Thanks for mentioning `keepdims=True`, that's truly useful for shape-invariant case – Maksym Ganenko Jun 15 '23 at 09:22

sergzach · Answer 14 · 2021-12-06T22:13:59.327

If you want all values in [0; 1] for 1d-array then just use

(a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))

Where a is your 1d-array.

An example:

>>> a = np.array([0, 1, 2, 4, 5, 2])
>>> (a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))
array([0. , 0.2, 0.4, 0.8, 1. , 0.4])

Note for the method. For saving proportions between values there is a restriction: 1d-array must have at least one 0 and consists of 0 and positive numbers.

score 1 · Answer 15 · answered Feb 11 '22 at 08:20

1

A simple dot product would do the job. No need for any extra package.

x = x/np.sqrt(x.dot(x))

By the way, if the norm of x is zero, it is inherently a zero vector, and cannot be converted to a unit vector (which has norm 1). If you want to catch the case of np.array([0,0,...0]), then use

norm = np.sqrt(x.dot(x))
x = x/norm if norm != 0 else x

answered Feb 11 '22 at 08:20

Ka Wa Yip

2,546
3
22
35

3

I often use this trick: x_normalised = x / (norm+(norm==0)) so in all cases where the norm is zero, you just divide by one. – user111950 May 01 '22 at 09:21

How to normalize a numpy array to a unit vector

15 Answers15

sci-kit learn `MinMaxScaler`

Code sample

Linked

Related

How to normalize a numpy array to a unit vector

15 Answers15

sci-kit learn MinMaxScaler

Code sample

Linked

Related

sci-kit learn `MinMaxScaler`