pandas: slice a MultiIndex by range of secondary index

Question

I have a series with a MultiIndex like this:

import numpy as np
import pandas as pd

buckets = np.repeat(['a','b','c'], [3,5,1])
sequence = [0,1,5,0,1,2,4,50,0]

s = pd.Series(
    np.random.randn(len(sequence)), 
    index=pd.MultiIndex.from_tuples(zip(buckets, sequence))
)

# In [6]: s
# Out[6]: 
# a  0    -1.106047
#    1     1.665214
#    5     0.279190
# b  0     0.326364
#    1     0.900439
#    2    -0.653940
#    4     0.082270
#    50   -0.255482
# c  0    -0.091730

I'd like to get the s['b'] values where the second index ('sequence') is between 2 and 10.

Slicing on the first index works fine:

s['a':'b']
# Out[109]: 
# bucket  value
# a       0        1.828176
#         1        0.160496
#         5        0.401985
# b       0       -1.514268
#         1       -0.973915
#         2        1.285553
#         4       -0.194625
#         5       -0.144112

But not on the second, at least by what seems to be the two most obvious ways:

1) This returns elements 1 through 4, with nothing to do with the index values

s['b'][1:10]

# In [61]: s['b'][1:10]
# Out[61]: 
# 1     0.900439
# 2    -0.653940
# 4     0.082270
# 50   -0.255482

However, if I reverse the index and the first index is integer and the second index is a string, it works:

In [26]: s
Out[26]: 
0   a   -0.126299
1   a    1.810928
5   a    0.571873
0   b   -0.116108
1   b   -0.712184
2   b   -1.771264
4   b    0.148961
50  b    0.089683
0   c   -0.582578

In [25]: s[0]['a':'b']
Out[25]: 
a   -0.126299
b   -0.116108

To run this code with Python 3, need to modify: `index=pd.MultiIndex.from_tuples(list(zip(buckets, sequence)))` (note the new `list`) — ashishsingal, Feb 14 '18 at 15:54
If you are interested in learning more about slicing and filtering multiindex DataFrames, please take a look at my post: [How do I slice or filter MultiIndex DataFrame levels?](https://stackoverflow.com/questions/53927460/how-do-i-slice-or-filter-multiindex-dataframe-levels). Thanks! — cs95, Jan 05 '19 at 07:02

score 39 · Accepted Answer · edited May 23 '17 at 11:59

39

As Robbie-Clarken answers, since 0.14 you can pass a slice in the tuple you pass to loc:

In [11]: s.loc[('b', slice(2, 10))]
Out[11]:
b  2   -0.65394
   4    0.08227
dtype: float64

Indeed, you can pass a slice for each level:

In [12]: s.loc[(slice('a', 'b'), slice(2, 10))]
Out[12]:
a  5    0.27919
b  2   -0.65394
   4    0.08227
dtype: float64

Note: the slice is inclusive.

Old answer:

You can also do this using:

s.ix[1:10, "b"]

(It's good practice to do in a single ix/loc/iloc since this version allows assignment.)

This answer was written prior to the introduction of iloc in early 2013, i.e. position/integer location - which may be preferred in this case. The reason it was created was to remove the ambiguity from integer-indexed pandas objects, and be more descriptive: "I'm slicing on position".

s["b"].iloc[1:10]

That said, I kinda disagree with the docs that ix is:

most robust and consistent way

it's not, the most consistent way is to describe what you're doing:

use loc for labels
use iloc for position
use ix for both (if you really have to)

Remember the zen of python:

explicit is better than implicit

edited May 23 '17 at 11:59

Community

1
1

answered Nov 15 '12 at 00:30

Andy Hayden

359,921
101
625
535

It feels like there ought to be a way to do this in one pass (using loc / without chaining), however assignment (`s['b'].ix[1:10]`) works so I guess it's ok. – Andy Hayden Jan 16 '14 at 18:15
Please @Andy-Hayden update you answer to comply with the new pandas API. as Robbie-Clarken shows: loc and slice indexing are recommended. – tbrittoborges Jan 27 '16 at 19:19
@mithrado thanks for pointing that out, i have been meaning to go through all my pandas answers and update them. I need to write a script as there's too many to do manually. :/ – Andy Hayden Jan 28 '16 at 00:13
2

Something that is not said in the answer, but that was what I was actually looking for: "You can use `slice(None)` to select all the contents of that level. You do not need to specify all the deeper levels, they will be implied as `slice(None)`". Source: [Pandas docs](http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers). – rocarvaj Jan 16 '18 at 17:10

EliadL · Answer 2 · 2019-07-24T12:56:51.883

13

Since pandas 0.15.0 this works:

s.loc['b', 2:10]

Output:

b  2   -0.503023
   4    0.704880
dtype: float64

With a DataFrame it's slightly different (source):

df.loc(axis=0)['b', 2:10]

edited Jul 24 '19 at 12:56

answered Jul 07 '19 at 14:38

EliadL

6,230
2
26
43

3

This is the most up-to-date answer. It also nicely points out the subtle difference b/w series and df. Thanks! – midtownguru Jul 17 '19 at 22:25

score 11 · Answer 3 · answered Jul 18 '15 at 11:53

11

As of pandas 0.14.0 it is possible to slice multi-indexed objects by providing .loc a tuple containing slice objects:

In [2]: s.loc[('b', slice(2, 10))]
Out[2]:
b  2   -1.206052
   4   -0.735682
dtype: float64

answered Jul 18 '15 at 11:53

Robbie Clarken

181
2
7

1

`slice(None)` to select all contents of that level. See [Using slicers](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#using-slicers). – young_souvlaki Oct 31 '21 at 02:14
Note that using `slice()` in `loc` is still inclusive. – young_souvlaki Oct 31 '21 at 02:27

score 4 · Answer 4 · answered Nov 15 '12 at 00:47

4

The best way I can think of is to use 'select' in this case. Although it even says in the docs that "This method should be used only when there is no more direct way."

Indexing and selecting data

In [116]: s
Out[116]: 
a  0     1.724372
   1     0.305923
   5     1.780811
b  0    -0.556650
   1     0.207783
   4    -0.177901
   50    0.289365
   0     1.168115

In [117]: s.select(lambda x: x[0] == 'b' and 2 <= x[1] <= 10)
Out[117]: b  4   -0.177901

answered Nov 15 '12 at 00:47

Ryan O'Neill

3,727
22
27

Surprisingly (for me at least), although comparable for small Series, this starts to become slower than using `ix` when the Series is longer than 250. (Tested using `%timeit` in ipython.) – Andy Hayden Nov 15 '12 at 09:43

jassinm · Answer 5 · 2012-11-15T00:39:37.547

0

not sure if this is ideal but it works by creating a mask

In [59]: s.index
Out[59]: 
MultiIndex
[('a', 0) ('a', 1) ('a', 5) ('b', 0) ('b', 1) ('b', 2) ('b', 4)
 ('b', 50) ('c', 0)]
In [77]: s[(tpl for tpl in s.index if 2<=tpl[1]<=10 and tpl[0]=='b')]                                                               
Out[77]: 
b  2   -0.586568
   4    1.559988

EDIT : hayden's solution is the way to go

edited Nov 15 '12 at 00:39

answered Nov 15 '12 at 00:16

jassinm

7,323
3
33
42

pandas: slice a MultiIndex by range of secondary index

5 Answers5

Old answer:

Linked