Creating a function which creates a new column based on values in two columns?

Question

I have data frame like -

ID     Min_Value    Max_Value
1       0           0.10562
2    0.10563        0.50641
3      0.50642      1.0

I have another data frame that contains Value as a column. I want to create a new column in second data frame which returns ID when Value is between Min_Value and Max_Value for a given ID as above data frame. I can use if-else conditions but number of ID's are large and code becomes too bulky. Is there a efficient way to do this?

Looks like [this previous answer](https://stackoverflow.com/questions/49382207/how-to-map-numeric-data-into-categories-bins-in-pandas-dataframe) solves this problem. — 4D45, Sep 23 '21 at 06:42

Ivan Shelonik · Accepted Answer · 2021-09-23T05:55:04.907

If I understand correctly, just join/merge it into one DataFrame, using "between" function you can choose right indexes which will be located in the second DataFrame.

import pandas as pd

data = {"Min_Value": [0, 0.10563, 0.50642], 
        "Max_Value": [0.10562, 0.50641, 1.0]}

df = pd.DataFrame(data, 
                  index=[1, 2, 3])

df2 = pd.DataFrame({"Value": [0, 0.1, 0.58]}, index=[1,2,3])

df = df.join(df2)

mask_between_values = df['Value'].between(df['Min_Value'], df['Max_Value'], inclusive="neither")

# This is the result
df2[mask_between_values]

1   0.00
3   0.58

Sunjid · Answer 2 · 2021-09-23T06:58:44.383

0

Suppose you have two dataframes df and new_df. You want to assign a new column as 'new_column' into new_df dataframe. The value of 'Value' column must be between 'Min_Value' and 'Max_Value' from df dataframe. Then this code may help you.

for i in range(0,len(df)):
    if df.loc[i,'Max_Value'] > new_df.loc[i,'Value'] and df.loc[i,'Min_value'] < new_df.loc[i,'Value']:
        new_df.loc[i,'new_column'] = df.loc[i, 'ID']

edited Sep 23 '21 at 06:58

answered Sep 23 '21 at 05:39

Sunjid

72
3

Got two syntax errors with this: missing ' following both Max_Value and Min_Value. Once those were fixed, got error: `ValueError: 3 is not in range`. (There were four rows in my `new_df` data.) – 4D45 Sep 23 '21 at 06:20
You got this error because you df dataframe has only 3 rows. New_df's length is greater than your df's length. You can use df instead of new_df for loop iteration. – Sunjid Sep 23 '21 at 06:39
I have edited the code. Check this out. Please use lowest length of dataframe for loop iteration so that you don't get Valueerror. – Sunjid Sep 23 '21 at 06:46
It still doesn't appear to do what I think the OP is requiring. If I use this input data (for new_df) `valuesData = pd.DataFrame([0.55, 0.123, 0.9, 0.3], columns=["Value"])` then it ends up with NaN values for the 1st and last items in new_df, and instead of producing a set of ID numbers, it just duplicates the Value numbers for the 2nd and 3rd rows. – 4D45 Sep 23 '21 at 06:50
I understood wrong firstly. I thought you want those data which is between Min_Values and Max_Values. But you just want the ID. I have edited the code. Please check this out – Sunjid Sep 23 '21 at 06:58

Creating a function which creates a new column based on values in two columns?

2 Answers2