Pandas merge dataframe with condition

Question

Suppose that I have df1 and df2 that looks like this:

    df1:
           start  end group
    index                  
    a          1    3     x
    a          3    6     x
    a          6    9     z
    b          1    7     y
    b          7   15     x
    b         15   17     y
    c          1    4     z
    c          4    9     z
    c          9   15     z
        
    df2:
           value
    index       
    a        1.0
    a        4.8
    a        7.0
    b        2.0
    b        5.0
    b        6.0
    c        2.0
    c        3.0
    c       14.0

I want to make a new column in df2 called group and assign it by condition that refers df2.value to df1.start and df1.end such as:

if df2.value >= df1.start and df2.value < df1.end:
    df2.group = df1.group

So in index 'a' for example, range 1-3=x; 3-6=x; 6-9=z, in index 'b', range 1-7=y; 7-15=x; 15-17=y, and in index 'c', range 1-4=z; 4-9=z; 9-15=z

so the results will be like this:

df2:
       value group
index             
a        1.0     x
a        4.8     x
a        7.0     z
b        2.0     y
b        5.0     y
b        6.0     y
c        2.0     z
c        3.0     z
c       14.0     z

How can I achieve this?

@Datanovice in this case the condition is unique for every index. example in index a: 1-3=x; 3-6=x; 6-9=z — greensquare68, Jul 06 '20 at 08:44

score 0 · Answer 1 · answered Jul 06 '20 at 08:00

Assuming, both the dataframes have equal length, you can achieve this easily with np.where

np.where((df2.value>=df1.start) & (df2.value<df1.end), df2.group=df1.group, df2.group=0)

This evaluates your condition, and if it is False, df2.group is set to 0.

score 0 · Answer 2 · answered Jul 06 '20 at 08:01

0

Try this approach:

list_in = []
for i in range(0, df2.shape[0]):
    if df1.iloc[i]['start']<=df2.iloc[i]['value']<df1.iloc[i]['end']:
        list_in.append(df1.iloc[i]['group'])
    else:
        list_in.append('Nan')
df2 = df2.assign(group = list_in)

answered Jul 06 '20 at 08:01

Tanoy Majumdar

100
9

this loop only evaluate by the start and end of individual rows so in df2.iloc[4]['group] it returns NaN instead of 'y' – greensquare68 Jul 06 '20 at 08:30
Yes, since df2.iloc[4]['group'] is not greater than df1.iloc[4]['group'] it should be nan. This is what I understood from the question. – Tanoy Majumdar Jul 06 '20 at 08:33
I edit the question for more clarity – greensquare68 Jul 06 '20 at 08:42

Pandas merge dataframe with condition

2 Answers2