Seems like combine_first() should do it for you:
df2 = df0.combine_first(df1)
...where df0
takes precedence over df1
when the indicies match. Although in your case, if they're identical, it doesn't really matter. But if they're not identical, that's how combine_first()
works.
The following is an example of it working with dummy data.
Code:
import pandas as pd
import io
a = io.StringIO(u'''
tradeID,amount,date
X001,100,1/1/2016
X002,200,1/2/2016
X003,300,1/3/2016
X005,500,1/5/2016
''')
b = io.StringIO(u'''
tradeID,amount,date
X004,400,1/4/2016
X005,500,1/5/2016
X006,600,1/6/2016
''')
dfA = pd.read_csv(a, index_col = 'tradeID')
dfB = pd.read_csv(b, index_col = 'tradeID')
df = dfA.combine_first(dfB)
Output:
amount date
tradeID
X001 100.0 1/1/2016
X002 200.0 1/2/2016
X003 300.0 1/3/2016
X004 400.0 1/4/2016
X005 500.0 1/5/2016
X006 600.0 1/6/2016
If you really want to use merge
you can still do that, but you'll need to add some syntax to keep your indicies (more info):
df = dfA.reset_index().merge(dfB.reset_index(), how = 'outer').set_index('tradeID')
I ran super rudimentary timing on these two options and combine_first()
consistently beat merge
by nearly 3x on this very small data set.
...and Igor Raush's version tested at or slightly faster than combine_first()
.