I have two dataframes which are quite big (2 000 000 rows approximately)
They both have two columns in common ('CD' and 'BA') and I would like to join my dataframes on those two columns.
I have many solution, but for now they are all taking way to long (more than 7 secs)
affect = df1.merge(df2, on=['BA', 'CD'], how='left')
affect = df1.set_index(['BA', 'CD']).join(df2.set_index(['BA', 'CD']), how='left')
df1.set_index(['BA', 'CD'], inplace=True)
df2.set_index(['BA', 'CD'], inplace=True)
affect = df1.join(df2, how='left')
Do you have any idea how to speed things up ?