89

Given a dictionary of data frames like:

dict = {'ABC': df1, 'XYZ' : df2}   # of any length...

where each data frame has the same columns and similar index, for example:

data           Open     High      Low    Close   Volume
Date                                                   
2002-01-17  0.18077  0.18800  0.16993  0.18439  1720833
2002-01-18  0.18439  0.21331  0.18077  0.19523  2027866
2002-01-21  0.19523  0.20970  0.19162  0.20608   771149

What is the simplest way to combine all the data frames into one, with a multi-index like:

symbol         ABC                                       XYZ
data           Open     High      Low    Close   Volume  Open ...
Date                                                   
2002-01-17  0.18077  0.18800  0.16993  0.18439  1720833  ...
2002-01-18  0.18439  0.21331  0.18077  0.19523  2027866  ...
2002-01-21  0.19523  0.20970  0.19162  0.20608   771149  ...

I've tried a few methods - eg for each data frame replace the columns with a multi-index like .from_product(['ABC', columns]) and then concatenate along axis=1, without success.

user247702
  • 23,641
  • 15
  • 110
  • 157
Zero
  • 11,593
  • 9
  • 52
  • 70

3 Answers3

113

You can do it with concat (the keys argument will create the hierarchical columns index):

d = {'ABC' : df1, 'XYZ' : df2}
print pd.concat(d.values(), axis=1, keys=d.keys())


                XYZ                                          ABC           \
               Open     High      Low    Close   Volume     Open     High   
Date                                                                        
2002-01-17  0.18077  0.18800  0.16993  0.18439  1720833  0.18077  0.18800   
2002-01-18  0.18439  0.21331  0.18077  0.19523  2027866  0.18439  0.21331   
2002-01-21  0.19523  0.20970  0.19162  0.20608   771149  0.19523  0.20970   


                Low    Close   Volume  
Date                                   
2002-01-17  0.16993  0.18439  1720833  
2002-01-18  0.18077  0.19523  2027866  
2002-01-21  0.19162  0.20608   771149

Really concat wants lists so the following is equivalent:

print(pd.concat([df1, df2], axis=1, keys=['ABC', 'XYZ']))
Seanny123
  • 8,776
  • 13
  • 68
  • 124
Karl D.
  • 13,332
  • 5
  • 56
  • 38
  • I don't think you need list constructor in Python 3. Might have changed with newer pandas versions. – Brad Solomon Oct 13 '17 at 20:41
  • 1
    @brad-solomon, you're right about current version of pandas. I think I answered this originally when most people were using 0.13.0 or 0.13.1 – Karl D. Oct 14 '17 at 21:55
  • 6
    as [discussed in this recent question](https://stackoverflow.com/questions/47172349/what-ordering-does-dict-keys-and-dict-values-guarantee), using `d.values` and `d.keys` this way should be avoided, since it's not guaranteed that the order will be maintained. Consider `keys, values = zip(*d.items())` – Adam Smith Nov 08 '17 at 07:01
  • 3
    I thought the order was arbitrary but consistent? Is that not true? As long as it's consistent across the calls, it's fine for this use case. – Karl D. Nov 09 '17 at 00:21
  • @KarlD., you are right. python2 [docs](https://docs.python.org/2/library/stdtypes.html#dict.items). This point is also discussed to some length in [this question](https://stackoverflow.com/questions/47172349/what-ordering-does-dict-keys-and-dict-values-guarantee/47172477#47172477). – shx2 Jul 17 '19 at 10:58
  • You can add level names directly. ``pd.concat([df1, df2], axis=1, keys=['ABC', 'XYZ'], names=["symbol", "price"])`` – Andi Jan 10 '22 at 11:53
4

pandas.concat does this automagically now:

import pandas as pd

index = ["row1", "row2"]
df_a = pd.DataFrame({"foo": range(0, 2), "bar": range(2, 4)}, index)
df_b = pd.DataFrame({"foo": range(4, 6), "bar": range(6, 8)}, index)

pd.concat({"A": df_a, "B": df_b}, axis=1)  # axis="columns" also works
       A       B    
     foo bar foo bar
row1   0   2   4   6
row2   1   3   5   7
william_grisaitis
  • 5,170
  • 3
  • 33
  • 40
3

Add a symbol column to your dataframes and set the index to include the symbol column, concat and then unstack that level:

The following assumes that there are as many symbols as DataFrames in your dict, and also that you check that the order of symbols is as you want it based on the order of the dict keys:

DF_dict = {'ABC': df1, 'XYZ' : df2} 
dict_keys = DF_dict.keys()
symbols = ['ABC', 'ZXY']

for x in xrange(len(symbols)):
    DF_dict[dict_keys[x]]['symbol'] = symbols[x]
    DF_dict[dict_keys[x]].reset_index(inplace = True)
    DF_dict[dict_keys[x]].set_index(['symbol', 'Date'], inplace = True)

DF = pd.concat(DF_dict[df] for df in dict_keys)
DF = DF.unstack('symbol')

I think that would be the approach I would take. Some people are against the inplace syntax. I use it here only as convenience.

Woody Pride
  • 13,539
  • 9
  • 48
  • 62
  • Wasn't able to test this as had no data, but it came from something I wrote previously. Let me know if it works.... – Woody Pride May 12 '14 at 03:40
  • That works - the key concept is to add a column and then pivot/unstack etc. I was stuck thinking about adding rows or column headings. – Zero May 12 '14 at 04:00
  • Yes, but Karl D's answer might be better for you. See the answer below. It's a lot more succinct – Woody Pride May 12 '14 at 04:08