0

Currently I have two dataframes that look like this:

FSample

enter image description here

GMSample

enter image description here

What I want is something that ideally looks like this:

enter image description here

I attempted to do something similar to

result = pd.concat([FSample,GMSample],axis=1)
result

But my result has the data stacked on top of each other. enter image description here

Then I attempted to use the merge command like this

result = pd.merge(FSample,GMSample,how='inner',on='Date')
result

From that I got a KeyError on 'Date'

So I feel like I am missing both an understanding of how I should be trying to combine these dataframes (i.e. multi-index?) and the syntax to do so properly.

Joseph U.
  • 4,457
  • 10
  • 41
  • 47
  • 1
    It is a bit harder to recreate your use case when you provide images of your input and output dataframes. Consider using `df.to_clipboard` and following the tips in https://stackoverflow.com/a/20159305/2514228. – Micah Smith Apr 07 '18 at 22:56

2 Answers2

1

You get a key error, because the Date is an index, whereas the "on" keyword in merge takes a column. Alternatively, you could remove Symbol from the indexes and then join the dataframes by the Date indexes.

FSample.reset_index("Symbol").join(GMSample.reset_index("Symbol"), lsuffix="_x", rsuffix="_y")
knirb
  • 143
  • 7
1

Working with MultiIndexes in pandas usually requires you to constantly set/reset the index. That is probably going to be the easiest thing to do in this case as well, as pd.merge does not immediately support merging on specific levels of a MultiIndex.

df_f = pd.DataFrame(
    data = {
        'Symbol': ['F'] * 5,
        'Date': pd.to_datetime(['2012-01-03', '2012-01-04', '2012-01-05', '2012-01-06', '2012-01-09']),
        'Close': [11.13, 11.30, 11.59, 11.71, 11.80],
    },
).set_index(['Symbol', 'Date']).sort_index()
df_gm = pd.DataFrame(
    data = {
        'Symbol': ['GM'] * 5,
        'Date': pd.to_datetime(['2012-01-03', '2012-01-04', '2012-01-05', '2012-01-06', '2012-01-09']),
        'Close': [21.05, 21.15, 22.17, 22.92, 22.84],
    },
).set_index(['Symbol', 'Date']).sort_index()

pd.merge(df_f.reset_index(level='Date'),
         df_gm.reset_index(level='Date'),
         how='inner',
         on='Date',
         suffixes=('_F', '_GM')
).set_index('Date')

The result:

    Close_F     Close_GM
Date        
2012-01-03  11.13   21.05
2012-01-04  11.30   21.15
2012-01-05  11.59   22.17
2012-01-06  11.71   22.92
2012-01-09  11.80   22.84
Micah Smith
  • 4,203
  • 22
  • 28