1

Suppose there are two dataframes that share the same index but have different columns. Is it smarter to merge the two dataframes here or concat?

import pandas as pd
from pandas import DataFrame

df1 = DataFrame(index = ['hey', 'yo'], columns = ['gee', 'thanks'], data = [[1,'foo'],[6,'rhy']]) 
df2 = DataFrame(index = ['hey', 'yo'], columns = ['youre', 'welcome'], data = [[8,'fotb'],[3,'yuo']])

#using merging
df3_merge = df1.merge(df2,left_index = True, right_index = True)  

#result:      
#             gee  thanks  youre  welcome
# hey          1    foo      8    fotb
# yo           6    rhy      3     yuo

#using concatenate
df3_concat = pd.concat([df1,df2], axis = 1)  

#result:      
#             gee  thanks  youre  welcome
# hey          1    foo      8    fotb
# yo           6    rhy      3     yuo

This link inspired this question. Typically I have always used concat, but I am curious to what others use or think.

StonedTensor
  • 610
  • 5
  • 19

1 Answers1

1

I think it depends, what need.

By default, in DataFrame.merge is inner join, but is possible change it to outer, right or left.

df3_merge = df1.merge(df2,left_index = True, right_index = True)  

Be default in concat is outer join, but is possible change it to inner only by inner parameter:

df3_concat = pd.concat([df1,df2], axis = 1)

Also if want join list of DataFrames, simplier and faster is concat method.

If want left join, concat cannot be used, because not implemented.


More information about concat.

More information about merge.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252