I am trying to concat 2 DataFrames, but .join is creating an unwanted duplicate.
df_ask:
timestamp price volume
1520259290 10.5 100
1520259275 10.6 2000
1520259275 10.55 200
df_bid:
timestamp price volume
1520259290 10.25 500
1520259280 10.2 300
1520259275 10.1 400
I tried:
depth = pd.concat([df_ask,df_bid], axis=1, keys=['Ask Orders','Bid Orders'])
but that returns an error which I do understand ("concat failed Reindexing only valid with uniquely valued Index objects")
and I tried:
df_ask.join(df_bid, how='outer', lsuffix='_ask', rsuffix='_bid')
Which gives no error, but gives the following dataframe:
timestamp price_ask volume_bid price_bid volume_bid
1520259290 10.5 100 10.25 500
1520259280 NaN NaN 10.2 300
1520259275 10.6 2000 10.1 400
1520259275 10.55 200 10.1 400
My problem is the repeated 10.1 and 400 at timestamp 1520259275. They weren't in the original df_bid dataframe twice and should only be in this df once. Having two rows of the same timestamp is correct as there are two ask rows at this time, however there should only be one bid information row associated with this timestamp. The other should be NaN.
ie What I'm looking for is this:
timestamp price_ask volume_bid price_bid volume_bid
1520259290 10.5 100 10.25 500
1520259280 NaN NaN 10.2 300
1520259275 10.6 2000 10.1 400
1520259275 10.55 200 NaN NaN
I've looked through the merge/join/concat documentation and this question but I can't find what I'm looking for. Thanks in advance