pandas multindex choose/drop rows based on second column

Question

Copying the example from this question, consider the following dataframe:

mux = pd.MultiIndex.from_arrays([
    list('aaaabbbbbccddddd'),
    list('tuvwtuvwtuvwtuvw')
], names=['one', 'two'])

df = pd.DataFrame({'col': np.arange(len(mux))}, mux)

         col
one two     
a   t      0
    u      1
    v      2
    w      3
b   t      4
    u      5
    v      6
    w      7
    t      8
c   u      9
    v     10
d   w     11
    t     12
    u     13
    v     14
    w     15

Let's say I want to keep only two rows of the second level of multi index. i.e. my final dataframe looks like this:

         col
one two     
a   t      0
    u      1
b   t      4
    u      5
c   u      9
    v     10
d   w     11
    t     12

What's the best way of achieving the above? Ideally, I would have liked to do something like this (obviously wrong syntax)

df.iloc[(:, :2)]

i.e. all values from level 0, and first 2 values from level 1.

score 3 · Accepted Answer · answered Dec 03 '19 at 03:35

3

use head(2) with groupby

df.groupby('one').head(2)

Out[246]:
         col
one two
a   t    0
    u    1
b   t    4
    u    5
c   u    9
    v    10
d   w    11
    t    12

answered Dec 03 '19 at 03:35

Andy L.

24,909
4
17
29

1

Ah, I was just about to update my answer with this :-). – Quang Hoang Dec 03 '19 at 03:36
@QuangHoang: Yours is nice, too :D – Andy L. Dec 03 '19 at 03:37
@AndyL. @QuangHoang Thanks for your answers. Is there a more generic answer of choosing output based on indices in a multilevel index? e.g. let's say I want to choose 1, 3, 5, 7.. `range(1, n, 2)` indices of second level in a multi level index. – skgbanga Dec 03 '19 at 11:51
@skgbanga: you want to choose `1, 3, 5, 7...` of the whole second level or of the second level of each value of first level? I.e. the whole second level would be `u, w, u, w...` while of each first level would be `a: u, w`, `b: u, w`, `c: v`... – Andy L. Dec 03 '19 at 18:23
@AndyL.generally I was wondering is there is a way of choosing elements from second level of the multilevel index using indexes? – skgbanga Dec 04 '19 at 02:31
1

@skgbanga: on positional indexing, there are differences between getting position `3` of 2nd level within each value of 1st level and getting universal position `3` regardless of the level of index. That's why I want to clarify before answer. If you want a general way to get 2nd index, you may use `get_level_values`, `query`, `xs`, `groupby.nth`, or `loc` with `pd.IndexSlice` and tuples... It is really broad. @CS95 has an excellent post on this topic here https://stackoverflow.com/questions/53927460/select-rows-in-pandas-multiindex-dataframe. I recommend you read it for more detail. – Andy L. Dec 04 '19 at 03:05
oh, I didn't notice you already linked @CS95 post in your question :) – Andy L. Dec 04 '19 at 03:14

score 2 · Answer 2 · answered Dec 03 '19 at 03:32

2

Here's one way with groupby:

df[df.groupby('one').cumcount().le(1)]

Output:

         col
one two     
a   t      0
    u      1
b   t      4
    u      5
c   u      9
    v     10
d   w     11
    t     12

answered Dec 03 '19 at 03:32

Quang Hoang

146,074
10
56
74

pandas multindex choose/drop rows based on second column

2 Answers2