2

I am new to Python and I'm trying to understand how to manipulate data with pandas DataFrames. I searched for similar questions but I don't see any satisfying my exact need. Please point me to the correct post if this is a duplicate.

So I have multiple DataFrames with the exact same shape, columns and index. How do I combine them with labels so I can easily access the data with any column/index/label?

E.g. after the setup below, how do I put df1 and df2 into one DataFrame and label them with the names 'df1' and 'df2', so I can access data in a way like df['A']['df1']['b'], and get number of rows of df?

>>> import numpy as np
>>> import pandas as pd
>>> df1 = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'], index=['a', 'b'])
>>> df2 = pd.DataFrame([[5, 6], [7, 8]], columns=['A', 'B'], index=['a', 'b'])
>>> df1
   A  B
a  1  2
b  3  4
>>> df2
   A  B
a  5  6
b  7  8
Jiaye
  • 41
  • 1
  • 1
  • 7

2 Answers2

10

I think MultiIndex DataFrame is answer created by concat:

df = pd.concat([df1, df2], keys=('df1','df2'))
print (df)
       A  B
df1 a  1  2
    b  3  4
df2 a  5  6
    b  7  8

Then for basic select is possible use xs:

print (df.xs('df1'))
   A  B
a  1  2
b  3  4

And for select index and columns together use slicers:

idx = pd.IndexSlice
print (df.loc[idx['df1', 'b'], 'A'])
3

Another possible solution is use panels.

But in newer versions of pandas is deprecated.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Thank you very much! Yes it looks like MultiIndex DataFrame is what I need. Is it possible to convert df['A'] into a 2D DataFrame with ['a', 'b'] as index and ['df1', 'df2'] as columns? – Jiaye Oct 24 '17 at 05:52
  • I think yes, only add `axis=1` like `df = pd.concat([df1, df2], keys=('df1','df2'), axis=1)` – jezrael Oct 24 '17 at 05:53
  • and then select by `idx = pd.IndexSlice` `print (df.loc['b',idx['df1', 'A']])` – jezrael Oct 24 '17 at 05:55
  • Or `df = pd.concat([df1, df2], keys=('df1','df2')).unstack(0)` ? and then `print (df.xs('A', axis=1))` ? – jezrael Oct 24 '17 at 06:01
  • @jezrael hi jezrael I m trying to do the same here concatenating through loop many pds. how can I properly concat the keys as well? I tried making a list I gradually append, but returns "Cannot concat indices that do not have the same number of levels" (which makes sense actually..) – lorenzo Mar 26 '19 at 10:13
0

Using xarray is recommended, as other answers to similar questions have suggested. Since pandas Panels were deprecated in favour of xarray.

Christabella Irwanto
  • 1,161
  • 11
  • 15