0

I have a pandas dataframe df that I built using 3 levels of columns, as follows:

     a1            a2            a3
     b1     b2     b1     b3     b1     b4
     c1 c2  c1 c2  c1 c2  c1 c2  c1 c2  c1 c2
     ... (data) ...

Note that each a column may have different b subcolumns, but each b column has the same c subcolumns.

I can extract e.g. the subcolumns from a2 using df["a2"].

How can I select based on the second or third level without having to specify the first and second level respectively? For instance I would like to say "give me all the c2 columns you can find" and I would get:

     a1     a2     a3
     b1 b2  b1 b3  b1 b4
     ... (data for the c2 columns) ...

Or "give me all the b1 columns" and I would get:

     a1     a2     a3
     c1 c2  c1 c2  c1 c2 
     ... (data for the b1 columns) ...
sunmat
  • 6,976
  • 3
  • 28
  • 44
  • 1
    On a side note, it'd be nice if you provide some example data that can be copied and pasted especially since creating MultiIndexes is a bit of a pain. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – fsimonjetz Nov 11 '22 at 13:48

2 Answers2

2

The docs provide some info on that. Adapting the examples from there to your example, either use tuples with slice objects you pass None,

df.loc[:, (slice(None), slice(None), "c2")]

or use pd.IndexSliceto use the familiar : notation:

idx = pd.IndexSlice
df.loc[:, idx[:, :, "c2"]]
fsimonjetz
  • 5,644
  • 3
  • 5
  • 21
1

When you have a dataframe with a MultiIndex the columns are a list of tuples containing the keys of each level.

You can get the columns you want with a list comprehension checking if the name you want is in the tuple:

# get all the "c2"
df[[col for col in df.columns if "c2" in col]]
# get all the "b1"
df[[col for col in df.columns if "b1" in col]]

If you want to be sure that the column name is in the right position:

# get all the "c2"
df[[col for col in df.columns if col[2] == "c2"]]
# get all the "b1"
df[[col for col in df.columns if col[1] == "b1"]]
Matteo Zanoni
  • 3,429
  • 9
  • 27