1

I have multiple dataframes df1, df2, df3,... that i want to merge based on the index. All of these have some columns that are duplicates in other dataframes.

For example:

In[1]: df1
Out[2]: 
            col1       col2
1697352     Other      Dynamic
1698918     Other      Other
1698957     Static     Dynamic
1698645     Static     Static

In[3]: df2
Out[4]: 

            col1       col2
1697352     Other      Force
1698918     Other      Power
1698957     Static     Force
1698645     Static     Power

In[5]: df3
Out[6]: 

            col1       col2      col3
1697352     Left       Force     Other
1698918     Right      Power     Other
1698957     Middle     Force     Static
1698645     Left       Power     Static

...

I would want something like this:

In[7]: df_merged
Out[8]: 
            col1       col2       col3      col4
1697352     Other      Dynamic    Force     Left
1698918     Other      Other      Power     Right
1698957     Static     Dynamic    Force     Middle
1698645     Static     Static     Power     Left

...

With two dataframes I have used,

cols_merged = df1.columns.differences(df2.columns)

and then merge them with

pd.merge(df1, df2[cols_merged],  left_index=True, right_index=True)

Is there a way of doing this in on go?

Maxi Trien
  • 27
  • 4

3 Answers3

1

You could actually create a dataframe from the extracted data, as suggested in this stackoverflow solution. If this is not an option for you, you could follow this documentation.

For example,you could try this:

frames=[df1,df2,df3]
pd.concat(frames,axis=1,join="outer",ignore_index=False)

I didn't get to test this solution, though. Tell me if this has worked for you.

MariCruzR
  • 147
  • 8
  • Your solution doesn't merge the dataframes on the index column. Then I would get 3 different rows for each index. But I will look into the other references. Thanks – Maxi Trien Aug 25 '21 at 13:15
  • Change axis=1, sorry for that. I'm editing this on my solution. – MariCruzR Aug 25 '21 at 13:17
1
df1.merge(df2).merge(df3)

Result: enter image description here

Niv Dudovitch
  • 1,614
  • 7
  • 15
1
  1. concat all the DataFrames.
  2. transpose and drop_duplicates to remove columns with duplicate values
  3. transpose to restore original structure and rename columns as needed
merged = pd.concat([df1, df2, df3], axis=1)
output = merged.T.drop_duplicates().T
output.columns = [f"col{i+1}" for i in range(len(output.columns))]

>>> output
           col1     col2   col3    col4
1697352   Other  Dynamic  Force    Left
1698918   Other    Other  Power   Right
1698957  Static  Dynamic  Force  Middle
1698645  Static   Static  Power    Left
not_speshal
  • 22,093
  • 2
  • 15
  • 30