Iterate over dataframes and merge by conditions

Question

i have to data frame

  id-input     id-output       Date         Price   Type 
    1            3           20/09/2020      100     ABC
    2            1           20/09/2020      200     ABC
    2            1           21/09/2020      300     ABC
    1            3           21/09/2020      50      AD
    1            2           21/09/2020      40      AD

I want to get this Output :

    id-inp-ABC  id-out-ABC  Date-ABC    Price-ABC Type-ABC  id-inp-AD   id-out-AD     Date-AD     Price-AD  Type-AD 
      
      1          3          20/09/2020     10        ABC       2            1       20/09/2020        10     AD 
      1'         3          20/09/2020     90        ABC       Nan          Nan        Nan            Nan    Nan
      2          1          20/09/2020     40        ABC       1            2       21/09/2020        40     AD
      2'         1          20/09/2020     160       ABC       Nan         Nan           Nan         Nan     Nan
      2          1          21/09/2020     300       ABC       Nan         Nan           Nan         Nan     Nan

My idea is to :

-divide the dataframe into two dataframes by type -iterate through the both dataframes and check if the same id-input == id-output

-check if the price is equal , if not split row and soustract the price. rename the columns and merge them.

grp = df.groupby('type')

transformed_df_list = []

for idx, frame in grp:
frame.reset_index(drop=True, inplace=True)
transformed_df_list.append(frame.copy())
ABC = pd.DataFrame([transformed_df_list[0])
AD =  pd.DataFrame([transformed_df_list[1])
for i , row in ABC.iterrows(): 
    for i, row1 in AD.iterrows(): 
        if row['id-inp'] == row1['id-out']:2
            row_df = pd.DataFrame([row1])
            row_df= row_df.rename(columns={'id-inp': 'id-inp-AD', 'id-out':'id-out-AD' , 'Date':'Date-AD' ,'price':'price-AD'})
            output = pd.merge(ABC.set_index('id-inp' , drop =False) ,row_df.set_index('id-out-AD' , drop =False),  how='left' , left_on =['id-inp'] ,  right_on =['id-inp-AD' ])

but the results is Nan in the id-inp-AD id-out-AD Date-AD Price-AD Type-AD part , and row_df contains just the last row :

1            2           21/09/2020      40      A

i want also that the iteration respect the order and each insert in the output dataframe is sorted by date.

score 0 · Answer 1 · answered Dec 21 '20 at 12:11

The most elegant way to solve your problem is to use pandas.DataFrame.pivot. You end up with multilevel column names instead of a single level. If you need to transfer the DataFrame back to single level column names, check the second answer here.

import pandas as pd

input = [
    [1, 3, '20/09/2020', 100, 'ABC'],
    [2, 1, '20/09/2020', 200, 'ABC'],
    [2, 1, '21/09/2020', 300, 'ABC'],
    [1, 3, '21/09/2020', 50, 'AD'],
    [1, 2, '21/09/2020', 40, 'AD']
]
df = pd.DataFrame(data=input, columns=["id-input", "id-output", "Date", "Price", "Type"])
df_pivot = df.pivot(columns=["Type"])
print(df_pivot)

Output

     id-input      id-output             Date              Price      
Type      ABC   AD       ABC   AD         ABC          AD    ABC    AD
0         1.0  NaN       3.0  NaN  20/09/2020         NaN  100.0   NaN
1         2.0  NaN       1.0  NaN  20/09/2020         NaN  200.0   NaN
2         2.0  NaN       1.0  NaN  21/09/2020         NaN  300.0   NaN
3         NaN  1.0       NaN  3.0         NaN  21/09/2020    NaN  50.0
4         NaN  1.0       NaN  2.0         NaN  21/09/2020    NaN  40.0

Iterate over dataframes and merge by conditions

1 Answers1