Normalization to bring in the range of [0,1]

Question

I have a huge data set from which I derive two sets of datapoints, which I then have to plot and compare. These two plots differ in their in their range, so I want them to be in the range of [0,1]. For the following code and a specific data set I get a constant line at 1 as the dataset plot, but this normalization works well for other sets:

plt.plot(range(len(rvalue)),np.array(rvalue)/(max(rvalue)))

and for this code :

oldrange = max(rvalue) - min(rvalue)  # NORMALIZING
newmin = 0
newrange = 1 + 0.9999999999 - newmin
normal = map(
    lambda x, r=float(rvalue[-1] - rvalue[0]): ((x - rvalue[0]) / r)*1 - 0, 
    rvalue)
plt.plot(range(len(rvalue)), normal)

I get the error:

ZeroDivisionError: float division by zero

for all the data sets. I am unable to figure out how to get both the plots in one range for comparison.

For those interested in normalizing data in Django, have a look a this solution: https://stackoverflow.com/a/68258914 — djvg, Jul 05 '21 at 17:45

user3284005 · Answer 1 · 2019-03-14T09:23:25.390

48

Use the following method to normalize your data in the range of 0 to 1 using min and max value from the data sequence:

import numpy as np

def NormalizeData(data):
    return (data - np.min(data)) / (np.max(data) - np.min(data))

edited Mar 14 '19 at 09:23

answered Mar 13 '19 at 11:59

user3284005

591
4
4

2

It is unclear what this adds to other answers or addresses the question. – Marc L. Mar 13 '19 at 14:10
1

I gives me nan value – Aditya Landge Apr 25 '19 at 22:58
4

@AdityaLandge you may get nan values for the following cases: 1) any element in the data is nan, or 2) max is equal to min, resulting in divide by zero error – user3284005 Feb 07 '21 at 08:14
@MarcL. it doesn't answer the question, it answers THE TITLE. And that is what a lot of people (if not most) are looking for. – pl80 Jul 14 '23 at 18:49

score 21 · Answer 2 · edited Jun 15 '21 at 14:59

Use scikit: http://scikit-learn.org/stable/modules/preprocessing.html#scaling-features-to-a-range

It has built in functions to scale features to a specified range. You'll find other functions to normalize and standardize here.

See this example:

>>> import numpy as np
>>> from sklearn import preprocessing
>>> X_train = np.array([[ 1., -1.,  2.],
...                     [ 2.,  0.,  0.],
...                     [ 0.,  1., -1.]])
...
>>> min_max_scaler = preprocessing.MinMaxScaler()
>>> X_train_minmax = min_max_scaler.fit_transform(X_train)
>>> X_train_minmax
array([[ 0.5       ,  0.        ,  1.        ],
       [ 1.        ,  0.5       ,  0.33333333],
       [ 0.        ,  1.        ,  0.        ]])

R Zhang · Answer 3 · 2019-10-12T15:02:40.130

10

scikit_learn has a function for this
sklearn.preprocessing.minmax_scale(X, feature_range=(0, 1), axis=0, copy=True)

More convenient than using the Class MinMaxScale.

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.minmax_scale.html#sklearn.preprocessing.minmax_scale

edited Oct 12 '19 at 15:02

answered Oct 10 '19 at 22:23

R Zhang

309
3
5

2

what is sklearn? Answers are generally less accepted if users have to ask for clarifications. I suggest adding more of an explanation to your answer so it isn't flagged as low quality. – Danoram Oct 11 '19 at 00:16

score 7 · Answer 4 · edited Jul 05 '21 at 14:41

Finding the range of an array is provided by numpy built-in function numpy.ptp(), your question can be addressed by:

#First we should filter input_array so that it does not contain NaN or Inf.
input_array=np.array(some_data)
if np.unique(input_array).shape[0]==1:
    pass #do thing if the input_array is constant
else:
    result_array=(input_array-np.min(input_array))/np.ptp(input_array)
#To extend it to higher dimension, add axis= kwarvg to np.min and np.ptp

Brionius · Answer 5 · 2013-08-22T16:11:25.977

2

I tried to simplify things a little. Try this:

oldmin = min(rvalue)
oldmax = max(rvalue)
oldrange = oldmax - oldmin
newmin = 0.
newmax = 1.
newrange = newmax - newmin
if oldrange == 0:            # Deal with the case where rvalue is constant:
    if oldmin < newmin:      # If rvalue < newmin, set all rvalue values to newmin
        newval = newmin
    elif oldmin > newmax:    # If rvalue > newmax, set all rvalue values to newmax
        newval = newmax
    else:                    # If newmin <= rvalue <= newmax, keep rvalue the same
        newval = oldmin
    normal = [newval for v in rvalue]
else:
    scale = newrange / oldrange
    normal = [(v - oldmin) * scale + newmin for v in rvalue]

plt.plot(range(len(rvalue)),normal)

The only reason I can see for the ZeroDivisionError is if the data in rvalue were constant (all values are the same). Is that the case?

edited Aug 22 '13 at 16:11

answered Aug 22 '13 at 12:35

Brionius

13,858
3
38
49

Yeah i see that for some cases the rvalue is constant and so the oldrange =0. Also i figured out that for most of the data sets my ravlue plot remains in the range of 0,1 , so i guess there wont be a need to normalise this plot just the other one needs to be normalised. But i was wondering that in order to make my code work for all kind of data sets(in which rvalue isn't in the range [0,1]), is there any way to normalise without getting an error?... – pypro Aug 22 '13 at 13:33
@user2690054: Sure, you just have to decide what the behavior should be. For example, if `rvalue = [-20, -20, ... , -20]`, should that be mapped to `[0.0, 0.0, ..., 0.0]`? And should `rvalue = [30, 30, ..., 30]` be mapped to `[1.0, 1.0, ..., 1.0]`? – Brionius Aug 22 '13 at 13:43
@user2690054 I added some statements to deal with oldrange being zero - see if it does what you want. – Brionius Aug 22 '13 at 13:48
I think the modifications should suffice my requirement depending upon what behavior i want as u mentioned..Thanks a lot for that...the only glitch is that the line:scale = newrange / oldrange should be in the else part bcoz it gives the zerodivision error at that place itself and doesn't enter into the if clause. Thanks for helping! – pypro Aug 22 '13 at 16:09

score 1 · Answer 6 · answered Jul 09 '21 at 07:43

Just to provide some background for the other answers, here's a derivation:

A straight line through points (x1, y1) and (x2, y2) can be expressed as:

y = y1 + slope * (x - x1)

where

slope = (y2 - y1) / (x2 - x1)

now, normalization from 0 to 1 implies

y1 = 0, y2 = 1

and

x1 = x_min, x2 = x_max

(or vice versa, depending on your needs)

the equation then reduces to

y = (x - x_min) / (x_max - x_min)

CreekGeek · Answer 7 · 2023-01-18T18:13:35.043

I prefer preprocessing tools for sci-kit learn similar to Marissa Novak's and RZhang's answers. Though I like a different structure:

import numpy as np
from sklearn import preprocessing

# data
years = [1972 1973 1974 1975 1976 1977 1978 1979 1984 1986 1989 1993 1994 1997
 1998 1999 2002 2004 2010 2017 2018 2021 2022]

# specify the range to which you want to scale
rng = (0, 1) 

# initiate the scaler
# 0,1 is the default feature_range and doesn't have to be specified
scaler = preprocessing.MinMaxScaler(feature_range=(rng[0], rng[1]))

# apply the scaler
normed = scaler.fit_transform(np.array(years).reshape(-1, 1))

# the output is an array of arrays, so tidy the dimensions
norm_lst = [round(i[0],2) for i in normed]

While this is more verbose than RZhang's answer and less preferable for the original use-case with a "huge" data set, I prefer it for readability for most of my applications (<10^3 values).

rng = (0,1) yields:

[0.0, 0.02, 0.04, 0.06, 0.08, 0.1, 0.12, 0.14, 0.24, 0.28, 0.34, 0.42, 0.44, 0.5, 0.52, 0.54, 0.6, 0.64, 0.76, 0.9, 0.92, 0.98, 1.0]

rng = (0.3,0.8), for example, yields:

[0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.42, 0.44, 0.47, 0.51, 0.52, 0.55, 0.56, 0.57, 0.6, 0.62, 0.68, 0.75, 0.76, 0.79, 0.8]

score 0 · Answer 8 · answered Aug 26 '21 at 11:11

0

you can divide each number in your sample by the sum of all the numbers in your sample.

answered Aug 26 '21 at 11:11

شیوا پاساراد

53
6

score -3 · Answer 9 · answered Mar 14 '19 at 09:24

-3

A simple way to normalize anything between 0 and 1 is just divide all the values by max value, from the all values. Will bring values between range of 0 to 1.

answered Mar 14 '19 at 09:24

Jay Dangar

3,271
1
16
35

2

This DOES NOT 't normalize "anything" between 0 and 1. This normalizes ONLY positive values. – Thomas Mar 01 '21 at 11:58
you also want the minimum value to be mapped to 0, this doesn't do that... – Stefano Oct 26 '22 at 14:28

Normalization to bring in the range of [0,1]

9 Answers9

Linked

Related