Need to perform the following operation on a pandas dataframe df
inside a for loop with 50 iterations or more:
Column'X' of df has to be merged with column 'X' of df1,
Column'Y' of df has to be merged with column 'Y' of df2,
Column'Z' of df has to be merged with column 'Z' of df3,
Column'W' of df has to be merged with column 'W' of df4
The columns which are common in all 5 dataframes - df, df1, df2, df3 and df4 are A, B, C and D
EDIT
The shape of all dataframes is different from one another where df
is the master dataframe having maximum number of rows and rest all other 4 dataframes have number of rows less than df but varying from each other. So while merging columns need to make sure that rows from both dataframes are matched first.
Input df
A B C D X Y Z W
1 2 3 4 nan nan nan nan
2 3 4 5 nan nan nan nan
5 9 7 8 nan nan nan nan
4 8 6 3 nan nan nan nan
df1
A B C D X Y Z W
2 3 4 5 100 nan nan nan
4 8 6 3 200 nan nan nan
df2
A B C D X Y Z W
1 2 3 4 nan 50 nan nan
df3
A B C D X Y Z W
1 2 3 4 nan nan 1000 nan
4 8 6 3 nan nan 2000 nan
df4
A B C D X Y Z W
2 3 4 5 nan nan nan 25
5 9 7 8 nan nan nan 35
4 8 6 3 nan nan nan 45
Output df
A B C D X Y Z W
1 2 3 4 nan 50 1000 nan
2 3 4 5 100 nan nan 25
5 9 7 8 nan nan nan 35
4 8 6 3 200 nan 2000 45
Which is the most efficient and fastest way
to achieve it? Tried using 4 separate combine_first
statements but that doesn't seem to be the most efficient way.
Can this be done by using just 1 line of code instead?
Any help will be appreciated. Many thanks in advance.