0

I have the following 3 dataframes

df_hl

State   District    Level   Name    TRU No_HH   TOT_HL_P   TOT_HL_M
0   0   India   India   Total   449787  1773040 
0   0   India   India   Rural   192891  834692  
0   0   India   India   Urban   256896  938348  

df_sc

State   District    Level   Name    TRU   No_HH_Head_sc TOT_P_sc
0   0   India   India   Total   41694863    201378372
0   0   India   India   Rural   31803775    153850848   
0   0   India   India   Urban   9891088     47527524        

df_st

State   District    Level   Name    TRU   No_HH_Head_st   TOT_P_st   TOT_M_st
0   0   India   India   Total   21511528    104545716   52547215    
0   0   India   India   Rural   19302332    94083844    47263733    
0   0   India   India   Urban   2209196     10461872    5283482

These are the structures of the dataframes after I have preprocessed their names. There are more number of columns and rows in the original dataset.

The thing I want to do is combine all these dataframes based on the following columns : `State, District, Level, Name and TRU.

My result dataframe should be of the following form :

State   District    Level   Name    TRU No_HH   TOT_HL_P   TOT_HL_M   No_HH_Head_sc  TOT_P_sc  No_HH_Head_st   TOT_P_st   TOT_M_st 
0   0   India   India   Total   449787  1773040   41694863    201378372  21511528    104545716   52547215
0   0   India   India   Rural   192891  834692   31803775    153850848 19302332    94083844    47263733  
0   0   India   India   Urban   256896  938348  41694863    201378372   2209196     10461872    5283482

I tried to create index based on those columns, but don't know how to combine all columns. Any help would be really appreciated.

df_hl.set_index(['State', 'District', 'Level', 'Name', "TRU"], inplace=True)
df_sc.set_index(['State', 'District', 'Level', 'Name', "TRU"], inplace=True)
df_st.set_index(['State', 'District', 'Level', 'Name', "TRU"], inplace=True)

Thanks a lot in advance.

Suraj
  • 2,253
  • 3
  • 17
  • 48

2 Answers2

1

Seems like this what you are looking for:

df1.merge(df2,how='left').merge(df3,how='left')?

Partha Mandal
  • 1,391
  • 8
  • 14
0

What you probably want is an inner joint of the dataframes. Assumed you have pandas imported as import pandas as pd:

df_final = pd.merge(left=pd.merge(left=df_hl, right=df_sc, on=["State", "District", "Level", "Name", "TRU"]), right=df_st, on=["State", "District", "Level", "Name", "TRU"])

A more generic way for doing this with an arbitrary number of dataframes is described here:

pandas three-way joining multiple dataframes on columns

import functools
dfs = [df_hl, df_sc, df_st]
df_final = functools.reduce(lambda left, right: pd.merge(left=left, right=right, on=["State", "District", "Level", "Name", "TRU"]), dfs)

valkyrie
  • 106
  • 6