0

I have 3 DataFrames:

import pandas as pd
df1 =  pd.DataFrame( np.random.randn(100,4), index = pd.date_range('1/1/2010', periods=100), columns = {"A", "B", "C", "D"}).T.sort_index()
df2 =  pd.DataFrame( np.random.randn(100,4), index = pd.date_range('1/1/2010', periods=100), columns = {"A", "B", "C", "D"}).T.sort_index()
df3 =  pd.DataFrame( np.random.randn(100,4), index = pd.date_range('1/1/2010', periods=100), columns = {"A", "B", "C", "D"}).T.sort_index()

I concatenate them creating a DataFrame with multi levels:

df_c = pd.concat([df1, df2, df3], axis = 1, keys = ["df1", "df2", "df3"])

Swap levels and sort:

df_c.columns = df_c.columns.swaplevel(0,1)
df_c = df_c.reindex_axis(sorted(df_c.columns), axis = 1)


ipdb> df_c
    2010-01-01                     2010-01-02
     df1       df2       df3        df1       df2       df3
A  -0.798407  0.124091  0.271089   0.754759 -0.575769  1.501942
B   0.602091 -0.415828  0.152780   0.530525  0.118447  0.057240
C  -0.440619 -1.074837 -0.618084   0.627520 -1.298814  1.029443
D  -0.242851 -0.738948 -1.312393   0.559021  0.196936 -1.074277

I would like to slice it to get values for individual rows, but so far I have only achieved such a degree of slicing:

cols = df_c.T.index.get_level_values(0)

ipdb> df_c.xs(cols[0], axis = 1, level = 0)
        df1       df2       df3
A -0.798407  0.124091  0.271089
B  0.602091 -0.415828  0.152780
C -0.440619 -1.074837 -0.618084
D -0.242851 -0.738948 -1.312393

The only way I found to get the values for each raw is to define a new dataframe,

   slcd_df = df_c.xs(cols[0], axis = 1, level = 0)

and then select rows using the usual proceadure:

  ipdb> slcd_df.ix["A", :]
  df1   -0.798407
  df2    0.124091
  df3    0.271089

But I was wondering whether there exists a better (meaning faster and more elegant) way to slice multilevel Dataframes.

Miquel
  • 668
  • 9
  • 22

1 Answers1

1

You can use pd.IndexSlice:

idx = pd.IndexSlice
sliced = df_c.loc["A", idx["2010-01-01", :]]
print(sliced)

2010-01-01  df1    0.199332
            df2    0.887018
            df3   -0.346778
Name: A, dtype: float64

Or you may also use slice(None):

print(df_c.loc["A", ("2010-01-01", slice(None))])

2010-01-01  df1    0.199332
            df2    0.887018
            df3   -0.346778
Name: A, dtype: float64
pansen
  • 6,433
  • 4
  • 19
  • 32
  • Your solution is nicer, but python throws an error saying that the index has to be fully lexsorted – Miquel Mar 29 '17 at 08:08
  • Yes, that's a common error here when the index is not sorted. It works on the dummy data used here but probably fails on your actual data, right? Use `sort_index` as described [here](http://stackoverflow.com/questions/29266600/how-to-slice-one-multiindex-dataframe-with-the-multiindex-of-another). – pansen Mar 29 '17 at 08:11
  • Exactly. I have sorted the data and dropped some duplicated values but still the error persists. Anyway, thanks for your answer! – Miquel Mar 29 '17 at 08:17
  • Have you used `df.sort_index(axis=1)`? You want to sort your columns here. The default of `sort_index` is rows. – pansen Mar 29 '17 at 08:23
  • yes yes, I remember that I faced that problem once. The thing is that I cannot sort the sublevels (df1, df2, df3 in this case). In my case I have more sublevels(count, rank3, rank5, rank10) and I never figured out how to sort them – Miquel Mar 29 '17 at 08:31