Select all the subcolumns with a given name from pandas dataframe

Question

I have a pandas dataframe df that I built using 3 levels of columns, as follows:

     a1            a2            a3
     b1     b2     b1     b3     b1     b4
     c1 c2  c1 c2  c1 c2  c1 c2  c1 c2  c1 c2
     ... (data) ...

Note that each a column may have different b subcolumns, but each b column has the same c subcolumns.

I can extract e.g. the subcolumns from a2 using df["a2"].

How can I select based on the second or third level without having to specify the first and second level respectively? For instance I would like to say "give me all the c2 columns you can find" and I would get:

     a1     a2     a3
     b1 b2  b1 b3  b1 b4
     ... (data for the c2 columns) ...

Or "give me all the b1 columns" and I would get:

     a1     a2     a3
     c1 c2  c1 c2  c1 c2 
     ... (data for the b1 columns) ...

On a side note, it'd be nice if you provide some example data that can be copied and pasted especially since creating MultiIndexes is a bit of a pain. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — fsimonjetz, Nov 11 '22 at 13:48

score 2 · Accepted Answer · answered Nov 11 '22 at 13:43

The docs provide some info on that. Adapting the examples from there to your example, either use tuples with slice objects you pass None,

df.loc[:, (slice(None), slice(None), "c2")]

or use pd.IndexSliceto use the familiar : notation:

idx = pd.IndexSlice
df.loc[:, idx[:, :, "c2"]]

score 1 · Answer 2 · answered Nov 11 '22 at 13:48

When you have a dataframe with a MultiIndex the columns are a list of tuples containing the keys of each level.

You can get the columns you want with a list comprehension checking if the name you want is in the tuple:

# get all the "c2"
df[[col for col in df.columns if "c2" in col]]
# get all the "b1"
df[[col for col in df.columns if "b1" in col]]

If you want to be sure that the column name is in the right position:

# get all the "c2"
df[[col for col in df.columns if col[2] == "c2"]]
# get all the "b1"
df[[col for col in df.columns if col[1] == "b1"]]

Select all the subcolumns with a given name from pandas dataframe

2 Answers2