Normalize columns in pandas data frame while once column is in a specific range

Question

I have a data frame in pandas which contains my Experimental data. It looks like this:

KE  BE  EXP_DATA  COL_1  COL_2  COL_3 ...
10  1   5         1      2      3   
9   2   .         .      .      .
8   3   .         .
7   4
6   5
.
.

The column KE is not used. BE are the Values for the x-axis and all other columns are y-axis values. For normalization I use the idea which is also presented here Normalize in the post of Michael Aquilina. There fore I need to find the maximum and the minimum of my Data. I do it like this

    minBE = self.data[EXP_DATA].min()
    maxBE = self.data[EXP_DATA].max()

Now I want to find the maximum and minimum value of this column but only for the Range in the "column" EXP_DATA when the "column" BE is in a certain range. So in essence I want to normalize the data only in a certain X-Range.

Solution

Thanks to the solution Milo gave me I now use this function:

def normalize(self, BE="Exp",NRANGE=False):
    """
    Normalize data by dividing all components by the max value of the data.

    """
    if BE not in self.data.columns:
        raise NameError("'{}' is not an existing column. ".format(BE) +
                        "Try list_columns()")
    if NRANGE and len(NRANGE)==2:
        upper_be = max(NRANGE)
        lower_be = min(NRANGE)
        minBE = self.data[BE][(self.data.index > lower_be) & (self.data.index < upper_be)].min()
        maxBE = self.data[BE][(self.data.index > lower_be) & (self.data.index < upper_be)].max()
        for col in self.data.columns:                                                           # this is done so the data in NRANGE is realy scalled between [0,1]
            msk = (self.data[col].index < max(NRANGE)) & (self.data[col].index > min(NRANGE))
            self.data[col]=self.data[col][msk]
    else:
    
        minBE = self.data[BE].min()
        maxBE = self.data[BE].max()

    for col in self.data.columns:
        self.data[col] = (self.data[col] - minBE) / (maxBE - minBE)

If I call the function with the parameter NRANGE=[a,b] and a and b are also the x limits of my plot it automatically scales the visible Y-values between 0 and 1 as the rest of the data is masked. IF the function is called without the NRANGE parameter the whole Range of the data passed to the function is scaled from 0 o 1.

Thank you for your help!

score 2 · Accepted Answer · answered Jul 14 '17 at 09:59

2

You can use boolean indexing. For example to select max and min values in column EXP_DATA where BE is larger than 2 and less than 5:

lower_be = 2
upper_be = 5

max_in_range = self.data['EXP_DATA'][(self.data['BE'] > lower_be) & (self.data['BE'] < upper_be)].max()
min_in_range = self.data['EXP_DATA'][(self.data['BE'] > lower_be) & (self.data['BE'] < upper_be)].min()

answered Jul 14 '17 at 09:59

Milo

3,172
3
19
21

wow, that was fast and it works! Now i just need to find a way to set the ylim automatically based in the given XRANGE. Instead what happens is the scale is set based upon the limits of the initial plot or the complete dataset. – NorrinRadd Jul 14 '17 at 10:54
You can do that with `ax.set_ylim(min_in_range, max_in_range)`, where ax is the axes object returned by the plot command. You can find plenty of examples on how to scale/limit the y-axis here on StackOverflow. – Milo Jul 14 '17 at 11:20
while this is a nice idea, in my special case i decided to mask the data . I only wanted to implement the above solution should i decide to only plot data in a specific X-RANGE. – NorrinRadd Jul 14 '17 at 12:56

Normalize columns in pandas data frame while once column is in a specific range

1 Answers1