9

I'm trying to use query on a MultiIndex column. It works on a MultiIndex row, but not the column. Is there a reason for this? The documentation shows examples like the first one below, but it doesn't indicate that it won't work for a MultiIndex column.

I know there are other ways to do this, but I'm specifically trying to do it with the query function

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.random((4,4)))
df.index = pd.MultiIndex.from_product([[1,2],['A','B']])
df.index.names = ['RowInd1', 'RowInd2']
# This works
print(df.query('RowInd2 in ["A"]'))

df = pd.DataFrame(np.random.random((4,4)))
df.columns = pd.MultiIndex.from_product([[1,2],['A','B']])
df.columns.names = ['ColInd1', 'ColInd2']
# query on index works, but not on the multiindexed column
print(df.query('index < 2'))
print(df.query('ColInd2 in ["A"]'))
Scott B
  • 2,542
  • 7
  • 30
  • 44
  • Have you read this [answer](https://stackoverflow.com/a/37644056/4819376)? – rpanai Jul 24 '18 at 16:20
  • 3
    Yes, I know there is xs and other methods, as I indicated in my question in bold. I'm specifically looking for a way to do this with the query function. The answer might be that it isn't possible. I thought that maybe there is a slightly different syntax that can be used to query with multiindex columns. – Scott B Jul 24 '18 at 16:44
  • I completely missed the sentence in bold. My bad. – rpanai Jul 24 '18 at 17:55

2 Answers2

1

To answer my own question, it looks like query shouldn't be used at all (regardless of using MultiIndex columns) for selecting certain columns, based on the answer(s) here:

Select columns using pandas dataframe.query()

Scott B
  • 2,542
  • 7
  • 30
  • 44
0

You can using IndexSlice

df.query('ilevel_0>2')
Out[327]: 
ColInd1         1                  2          
ColInd2         A         B        A         B
3        0.652576  0.639522  0.52087  0.446931
df.loc[:,pd.IndexSlice[:,'A']]
Out[328]: 
ColInd1         1         2
ColInd2         A         A
0        0.092394  0.427668
1        0.326748  0.383632
2        0.717328  0.354294
3        0.652576  0.520870
BENY
  • 317,841
  • 20
  • 164
  • 234
  • 4
    I am aware of this, but I'm specifically looking for a way to do this with the query function. The answer might be that it isn't possible, but I thought that maybe there is a slightly different syntax that can be used to query with multiindex columns. – Scott B Jul 24 '18 at 16:46