Indexing with multiindex dataframe in pandas

Question

Consider the following example data:

data = {"Taxon": ["Firmicutes"]*5,
        "Patient": range(5),
        "Tissue": np.random.randint(0, 1000, size=5),
        "Stool": np.random.randint(0, 1000, size=5)}

df = pd.DataFrame(data).set_index(["Taxon", "Patient"])
print(df)

                    Stool  Tissue
Taxon      Patient               
Firmicutes 0          740     389
           1          786     815
           2          178     265
           3          841     484
           4          211     534

So, How can I query the dataframe only with the second level index Patient only? For example, I'd like to know all the data with respect to Patient 2.

I've tried data[data.index.get_level_values(1)==2], and it worked fine. But is there any way to achieve the same with one these (loc,iloc or ix) indexing methods?

score 3 · Accepted Answer · answered Mar 26 '17 at 10:05

3

I think the simpliest is use xs:

np.random.seed(100)
names = ['Taxon','Patient']
mux = pd.MultiIndex.from_product([['Firmicutes', 'another'], range(1, 6)], names=names)
df = pd.DataFrame(np.random.randint(10, size=(10,2)), columns=['Tissue','Stool'], index=mux)
print (df)
                    Tissue  Stool
Taxon      Patient               
Firmicutes 1             8      8
           2             3      7
           3             7      0
           4             4      2
           5             5      2
another    1             2      2
           2             1      0
           3             8      4
           4             0      9
           5             6      2

print (df.xs(2, level=1))
            Tissue  Stool
Taxon                    
Firmicutes       3      7
another          1      0

#if need also level Patient
print (df.xs(2, level=1, drop_level=False))
                    Tissue  Stool
Taxon      Patient               
Firmicutes 2             3      7
another    2             1      0

Solution with loc - is possible specify axis:

print (df.loc(axis=0)[:,2])
                    Tissue  Stool
Taxon      Patient               
Firmicutes 2             3      7
another    2             1      0

answered Mar 26 '17 at 10:05

jezrael

822,522
95
1,334
1,252

Great help. The method works like a charm, and I didn't even know it had the `level` option. Thanks a lot. – James Wong Mar 26 '17 at 10:49
The `loc` one still doesn't work, and give an error: `KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'`. Have no idea why that is. – James Wong Mar 26 '17 at 10:54
You need `df = df.sort_index(axis=1)` - see [docs](http://pandas.pydata.org/pandas-docs/stable/advanced.html#sorting-a-multiindex) – jezrael Mar 26 '17 at 11:00
Indeed. Thank you very much. :) – James Wong Mar 26 '17 at 11:03
I typically just use df.loc[:, ('Upper Index', ('lower_idx_v1', 'lower_idx_v2'))] -- can't that work here? I find that simpler. – Zach Oakes Feb 21 '21 at 17:28
@ZachOakes- If need select by multiple values, then `df.xs(2, level=1)` cannot be used, but if need select only by one level,simplier should be `df.xs(2, level=1)` ? – jezrael Feb 21 '21 at 17:31

pansen · Answer 2 · 2017-03-26T10:48:56.507

0

Yes, use pd.IndexSlice which is exactly what you are looking for. See the documentation here.

Some dummy data:

data = {"Taxon": ["Firmicutes"]*5,
        "Patient": range(5),
        "Tissue": np.random.randint(0, 1000, size=5),
        "Stool": np.random.randint(0, 1000, size=5)}

df = pd.DataFrame(data).set_index(["Taxon", "Patient"])
print(df)

                    Stool  Tissue
Taxon      Patient               
Firmicutes 0          158     137
           1          697     980
           2          751     759
           3          171     556
           4          701     620

You can write it explicitly like:

df.loc[(slice(None), 2), :]

                        Stool   Tissue
Taxon       Patient         
Firmicutes        2     751     759

Or you may use the more readable pd.IndexSlice:

idx = pd.IndexSlice
df.loc[idx[:, 2], :]

                        Stool   Tissue
Taxon       Patient         
Firmicutes        2     751     759

edited Mar 26 '17 at 10:48

answered Mar 26 '17 at 09:05

pansen

6,433
4
19
32

Unfortunately, your second solution throws me `KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'`. What is that about? – James Wong Mar 26 '17 at 10:43
@JamesWong It works in my dummy example as updated in the answer. As the `KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted` mentions, try to sort the index of your MultiIndexed data frame first with `df = df.sort_index()` as described [here](http://stackoverflow.com/questions/29266600/how-to-slice-one-multiindex-dataframe-with-the-multiindex-of-another) in more detail. – pansen Mar 26 '17 at 10:51

Indexing with multiindex dataframe in pandas

2 Answers2

Linked

Related