-1

Consider the following example data:

data = {"Taxon": ["Firmicutes"]*5,
        "Patient": range(5),
        "Tissue": np.random.randint(0, 1000, size=5),
        "Stool": np.random.randint(0, 1000, size=5)}

df = pd.DataFrame(data).set_index(["Taxon", "Patient"])
print(df)

                    Stool  Tissue
Taxon      Patient               
Firmicutes 0          740     389
           1          786     815
           2          178     265
           3          841     484
           4          211     534

So, How can I query the dataframe only with the second level index Patient only? For example, I'd like to know all the data with respect to Patient 2.

I've tried data[data.index.get_level_values(1)==2], and it worked fine. But is there any way to achieve the same with one these (loc,iloc or ix) indexing methods?

pansen
  • 6,433
  • 4
  • 19
  • 32
James Wong
  • 1,107
  • 3
  • 15
  • 26

2 Answers2

3

I think the simpliest is use xs:

np.random.seed(100)
names = ['Taxon','Patient']
mux = pd.MultiIndex.from_product([['Firmicutes', 'another'], range(1, 6)], names=names)
df = pd.DataFrame(np.random.randint(10, size=(10,2)), columns=['Tissue','Stool'], index=mux)
print (df)
                    Tissue  Stool
Taxon      Patient               
Firmicutes 1             8      8
           2             3      7
           3             7      0
           4             4      2
           5             5      2
another    1             2      2
           2             1      0
           3             8      4
           4             0      9
           5             6      2

print (df.xs(2, level=1))
            Tissue  Stool
Taxon                    
Firmicutes       3      7
another          1      0

#if need also level Patient
print (df.xs(2, level=1, drop_level=False))
                    Tissue  Stool
Taxon      Patient               
Firmicutes 2             3      7
another    2             1      0

Solution with loc - is possible specify axis:

print (df.loc(axis=0)[:,2])
                    Tissue  Stool
Taxon      Patient               
Firmicutes 2             3      7
another    2             1      0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Great help. The method works like a charm, and I didn't even know it had the `level` option. Thanks a lot. – James Wong Mar 26 '17 at 10:49
  • The `loc` one still doesn't work, and give an error: `KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'`. Have no idea why that is. – James Wong Mar 26 '17 at 10:54
  • You need `df = df.sort_index(axis=1)` - see [docs](http://pandas.pydata.org/pandas-docs/stable/advanced.html#sorting-a-multiindex) – jezrael Mar 26 '17 at 11:00
  • Indeed. Thank you very much. :) – James Wong Mar 26 '17 at 11:03
  • I typically just use df.loc[:, ('Upper Index', ('lower_idx_v1', 'lower_idx_v2'))] -- can't that work here? I find that simpler. – Zach Oakes Feb 21 '21 at 17:28
  • @ZachOakes- If need select by multiple values, then `df.xs(2, level=1)` cannot be used, but if need select only by one level,simplier should be `df.xs(2, level=1)` ? – jezrael Feb 21 '21 at 17:31
0

Yes, use pd.IndexSlice which is exactly what you are looking for. See the documentation here.

Some dummy data:

data = {"Taxon": ["Firmicutes"]*5,
        "Patient": range(5),
        "Tissue": np.random.randint(0, 1000, size=5),
        "Stool": np.random.randint(0, 1000, size=5)}

df = pd.DataFrame(data).set_index(["Taxon", "Patient"])
print(df)

                    Stool  Tissue
Taxon      Patient               
Firmicutes 0          158     137
           1          697     980
           2          751     759
           3          171     556
           4          701     620

You can write it explicitly like:

df.loc[(slice(None), 2), :]

                        Stool   Tissue
Taxon       Patient         
Firmicutes        2     751     759

Or you may use the more readable pd.IndexSlice:

idx = pd.IndexSlice
df.loc[idx[:, 2], :]

                        Stool   Tissue
Taxon       Patient         
Firmicutes        2     751     759
pansen
  • 6,433
  • 4
  • 19
  • 32
  • Unfortunately, your second solution throws me `KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'`. What is that about? – James Wong Mar 26 '17 at 10:43
  • @JamesWong It works in my dummy example as updated in the answer. As the `KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted` mentions, try to sort the index of your MultiIndexed data frame first with `df = df.sort_index()` as described [here](http://stackoverflow.com/questions/29266600/how-to-slice-one-multiindex-dataframe-with-the-multiindex-of-another) in more detail. – pansen Mar 26 '17 at 10:51