20

I have a pandas dataframe with 3 levels of a MultiIndex. I am trying to pull out rows of this dataframe according to a list of values that correspond to two of the levels.

I have something like this:

ix = pd.MultiIndex.from_product([[1, 2, 3], ['foo', 'bar'], ['baz', 'can']], names=['a', 'b', 'c'])
data = np.arange(len(ix))
df = pd.DataFrame(data, index=ix, columns=['hi'])
print(df)

           hi
a b   c      
1 foo baz   0
      can   1
  bar baz   2
      can   3
2 foo baz   4
      can   5
  bar baz   6
      can   7
3 foo baz   8
      can   9
  bar baz  10
      can  11

Now I want to take all rows where index levels 'b' and 'c' are in this index:

ix_use = pd.MultiIndex.from_tuples([('foo', 'can'), ('bar', 'baz')], names=['b', 'c'])

i.e. values of hi having ('foo', 'can') or ('bar', 'baz') in levels b and c respectively: (1, 2, 5, 6, 9, 10).

So I'd like to take a slice(None) on the first level, and pull out specific tuples on the second and third levels.

Initially I thought that passing a multi-index object to .loc would pull out the values / levels that I wanted, but this isn't working. What's the best way to do something like this?

YaOzI
  • 16,128
  • 9
  • 76
  • 72
choldgraf
  • 3,539
  • 4
  • 22
  • 27
  • I had a couple of different attempts at getting this to work. I think I've found a decent workaround to the fact that this currently seems to be a bit hard. Try it out! – LondonRob Aug 13 '15 at 13:43

4 Answers4

21

Here is a way to get this slice:

df.sort_index(inplace=True)
idx = pd.IndexSlice
df.loc[idx[:, ('foo','bar'), 'can'], :]

yielding

           hi
a b   c      
1 bar can   3
  foo can   1
2 bar can   7
  foo can   5
3 bar can  11
  foo can   9

Note that you might need to sort MultiIndex before you can slice it. Well pandas is kind enough to warn if you need to do it:

KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (1)'

You can read more on how to use slicers in the docs

If for some reason using slicers is not an option here is a way to get the same slice using .isin() method:

df[df.index.get_level_values('b').isin(ix_use.get_level_values(0)) & df.index.get_level_values('c').isin(ix_use.get_level_values(1))]

Which is clearly not as concise.

UPDATE:

For the conditions that you have updated here is a way to do it:

cond1 = (df.index.get_level_values('b').isin(['foo'])) & (df.index.get_level_values('c').isin(['can']))
cond2 = (df.index.get_level_values('b').isin(['bar'])) & (df.index.get_level_values('c').isin(['baz']))
df[cond1 | cond2]

producing:

           hi
a b   c      
1 foo can   1
  bar baz   2
2 foo can   5
  bar baz   6
3 foo can   9
  bar baz  10
Primer
  • 10,092
  • 5
  • 43
  • 55
  • This is really close to what I'm looking for, but I should have phrased the question more clearly. In reality, what I need is a value from level "c" that is dependent on the value in level "b". E.g., any time that level "b" is 'foo', I want the value where level "c" is 'can', and any time the level "b" is 'bar', I want the value where level "c" is 'baz' – choldgraf Mar 25 '15 at 22:53
  • Updated the answer with those two conditions, which should give an idea how to handle this. – Primer Mar 26 '15 at 08:04
2

I would recommend the query() method just like in this Q&A.

Simply using this, which I think is a more natural way to express:

In [27]: df.query("(b == 'foo' and c == 'can') or (b == 'bar' and c == 'baz')")
Out[27]: 
           hi
a b   c      
1 foo can   1
  bar baz   2
2 foo can   5
  bar baz   6
3 foo can   9
  bar baz  10
Community
  • 1
  • 1
YaOzI
  • 16,128
  • 9
  • 76
  • 72
0

I find it interesting that this doesn't work:

In [45]: df.loc[(idx[:, 'foo', 'can'], idx[:, 'bar', 'baz']), ]
Out[45]: 
           hi
a b   c      
1 bar baz   2
      can   3
  foo baz   0
      can   1
2 bar baz   6
      can   7
  foo baz   4
      can   5
3 bar baz  10
      can  11
  foo baz   8
      can   9

It sort of looks like it "should", somehow. In any case, here's a reasonable workaround:

Let's assume the tuples you want to slice by are in the index of another DataFrame (since it sounds like they probably are in your case!).

In [53]: ix_use = pd.MultiIndex.from_tuples([('foo', 'can'), ('bar', 'baz')], names=['b', 'c'])
In [55]: other = pd.DataFrame(dict(a=1), index=ix_use)
In [56]: other
Out[56]: 
         a
b   c     
foo can  1
bar baz  1

Now to slice df by the index of other we can use the fact that .loc/.ix allow you to give a list of tuples (see the last example here).

First let's build the list of tuples we want:

In [13]: idx = [(x, ) + y for x in df.index.levels[0] for y in other.index.values]
In [14]: idx
Out[14]: 
[(1, 'foo', 'can'),
 (1, 'bar', 'baz'),
 (2, 'foo', 'can'),
 (2, 'bar', 'baz'),
 (3, 'foo', 'can'),
 (3, 'bar', 'baz')]

Now we can pass this list to .ix or .loc:

In [17]: df.ix[idx]
Out[17]: 
           hi
a b   c      
1 foo can   1
  bar baz   2
2 foo can   5
  bar baz   6
3 foo can   9
  bar baz  10
LondonRob
  • 73,083
  • 37
  • 144
  • 201
0
import itertools
import pandas as pd
import numpy as np
from pandas import DataFrame as df


ix = pd.MultiIndex.from_product([[1, 2, 3], ['foo', 'bar'], ['baz', 'can']], names=['a', 'b', 'c'])
data = np.arange(len(ix))
df = pd.DataFrame(data, index=ix, columns=['hi'])
print(df)

"""
           hi
a b   c      
1 foo baz   0
      can   1
  bar baz   2
      can   3
2 foo baz   4
      can   5
  bar baz   6
      can   7
3 foo baz   8
      can   9
  bar baz  10
      can  11

"""
a = (df.index.get_level_values('b') =='foo')
b = (df.index.get_level_values('c') =='can')
ab = (a & b)
dab = df[a & b]
print(dab)

"""
          hi
a b   c      
1 foo can   1
2 foo can   5
3 foo can   9
"""
a1 = (df.index.get_level_values('b') =='bar')
b1 = (df.index.get_level_values('c') =='baz')
ab1 = (a1 & b1)
dab1 = df[a1 & b1]
print(dab1)
"""
           hi
a b   c      
1 bar baz   2
2 bar baz   6
3 bar baz  10
"""

abab = df[ab | ab1]
print(abab)
"""
           hi
a b   c      
1 foo can   1
  bar baz   2
2 foo can   5
  bar baz   6
3 foo can   9
  bar baz  10
"""
Soudipta Dutta
  • 1,353
  • 1
  • 12
  • 7