4

Question about selecting rows by MultiIndex.

df:

              colA   ColB

   A1   B1   1  ca1  cb1
             2  ca2  cb2
        B2   1  ca3  cb3
             2  ca4  cb4 
   A2   B1   1  ca5  cb5
             2  ca6  cb6
        B2   1  ca7  cb7
             2  ca8  cb8

I want to select colA from A2/B1 where MultiIndex levels(2) > 1. I'm doing it in multiple steps:

df1=df.loc[A2,[B7],:,]  

for some reason I have to present level(1) in brackets

df1.index = x1.index.droplevel(0)

df1.index = x1.index.droplevel(0)

df1.loc[1:,'colA']

I know it is way too much but I cant figure how to do it simpler. Ways I can think of don't want to work.

cs95
  • 379,657
  • 97
  • 704
  • 746
Ranny
  • 315
  • 2
  • 13
  • Possible duplicate of [How to query MultiIndex index columns values in pandas](https://stackoverflow.com/questions/17921010/how-to-query-multiindex-index-columns-values-in-pandas) – Gabriel A Jan 06 '18 at 18:58
  • ^ index in the dupe above is a named index, so it doesn't apply here. – cs95 Jan 06 '18 at 19:05
  • @Ranny, please post some reproducible data next time. You can do it easily with `df.to_dict()`. Thanks! – Tai Jan 06 '18 at 19:30

2 Answers2

6

Since you're working with unnamed index columns, one way is using query on ilevel_* -

df.query("ilevel_0 == 'A2' and ilevel_1 == 'B1' and ilevel_2 > 1")

        colA ColB
A2 B1 2  ca6  cb6

Another method is using index.get_level_values -

f = df.index.get_level_values
df.loc[(f(0) == 'A2') & (f(1) == 'B1') & (f(2) > 1)]

        colA ColB
A2 B1 2  ca6  cb6
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Before I try. I know that query is little slow. What will change if I add names to index levels? – Ranny Jan 06 '18 at 19:20
  • @Ranny Not much, actually... just the string expression. If you don't want `query`, `loc` should be pretty fast. – cs95 Jan 06 '18 at 19:23
  • Renaming `df.index.get_level_values` to `f` is pretty cool! +1 – Tai Jan 06 '18 at 19:27
  • Thanks for all help. I will go with 'f' method, but will keep other two in arsenal. I just have to get rid of first two levels of MultiIndex. I think reindex will do it just fine. Thanks again. – Ranny Jan 06 '18 at 19:54
  • @Ranny You're welcome. Don't forget to vote on our answers, and accept the most helpful one. – cs95 Jan 06 '18 at 19:55
5

You can also do it in two operations. First use loc to select parts in A2 and B1, and then do another query at the remaining index, by setting query('index > 1').

df = pd.DataFrame({'ColB': {('A1', 'B1', 1): 'cb1',
  ('A1', 'B1', 2): 'cb2',
  ('A1', 'B2', 1): 'cb3',
  ('A1', 'B2', 2): 'cb4',
  ('A2', 'B1', 1): 'cb5',
  ('A2', 'B1', 2): 'cb6',
  ('A2', 'B2', 1): 'cb7',
  ('A2', 'B2', 2): 'cb8'},
 'colA': {('A1', 'B1', 1): 'ca1',
  ('A1', 'B1', 2): 'ca2',
  ('A1', 'B2', 1): 'ca3',
  ('A1', 'B2', 2): 'ca4',
  ('A2', 'B1', 1): 'ca5',
  ('A2', 'B1', 2): 'ca6',
  ('A2', 'B2', 1): 'ca7',
  ('A2', 'B2', 2): 'ca8'}})

# Answer here
df.loc["A2", "B1"].query("index > 1")
Output:
    ColB    colA
 2  cb6     ca6
Tai
  • 7,684
  • 3
  • 29
  • 49
  • I like this one (because I understand it better). It works, just I need to sort index before. The only thing I cant pass variable inside quotation marks for querry. How do do this? a=1; "index > a" – Ranny Jan 06 '18 at 19:37
  • 1
    @Ranny .`query("index > @a")` – cs95 Jan 06 '18 at 19:41
  • @Ranny mixing `loc` and `query` isn't as clean as using either though. :-) – cs95 Jan 06 '18 at 19:42
  • Thanks Tai. Actually I will go with your solution, there is quite big difference in %timeit. 3.6 vs 71.6 (without f assignment). And I'll remember about .to_dict() next time. – Ranny Jan 06 '18 at 20:53
  • @Ranny Cool. Thanks for letting me know about your experiments. – Tai Jan 06 '18 at 20:55