0

I have two dataframes which are quite big (2 000 000 rows approximately)

They both have two columns in common ('CD' and 'BA') and I would like to join my dataframes on those two columns.

I have many solution, but for now they are all taking way to long (more than 7 secs)

affect = df1.merge(df2, on=['BA', 'CD'], how='left')

affect = df1.set_index(['BA', 'CD']).join(df2.set_index(['BA', 'CD']), how='left')

df1.set_index(['BA', 'CD'], inplace=True)
df2.set_index(['BA', 'CD'], inplace=True)
affect = df1.join(df2, how='left')

Do you have any idea how to speed things up ?

cs95
  • 379,657
  • 97
  • 704
  • 746
Louis
  • 87
  • 8

0 Answers0