2
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
          np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]

df2 = pd.DataFrame(np.random.randn(8, 4), index=arrays)

The matrix I have is df2. Now I want to select all the rows of 'foo', 'one' & 'two', but only the 'one' row of multiIndex 'bar'. This seems very easy but I have tried multiple things without succes.

df2.loc['bar':('foo','one')]

, Produces a similar matrix but including the rows of 'baz' that I don't want.

df2.loc[idx['foo','bar'],idx['one','two'], :]

, also similar but the second row of 'foo', 'two' I don't want.

Would be great if anybody could help and has some tips for handling the multiIndex!

cs95
  • 379,657
  • 97
  • 704
  • 746
Herwini
  • 371
  • 1
  • 19
  • 2
    I'm really confident [Select rows in MultiIndex dataFrame](https://stackoverflow.com/questions/53927460/select-rows-in-pandas-multiindex-dataframe/53927461#53927461) will help. – cs95 Jul 07 '20 at 08:46
  • Thanks for the reply! will have a look – Herwini Jul 07 '20 at 08:48
  • 1
    In the meantime this seems like the fastest way to solve your problem: `pd.concat([df.loc[['foo']], df.loc[[('bar', 'one')]]]) ` – cs95 Jul 07 '20 at 08:54

1 Answers1

1

In a single line, the simplest way IMO is to build an expression with query, as described here:

df.query("ilevel_0 == 'foo' or (ilevel_0 == 'bar' and ilevel_1 == 'one')") 

                0         1         2         3
bar one  0.249768  0.619312  1.851270 -0.593451
foo one  0.770139 -2.205407  0.359475 -0.754134
    two -1.109005 -0.802934  0.874133  0.135057

Otherwise, using more conventional means, you may consider

pd.concat([df.loc[['foo']], df.loc[[('bar', 'one')]]]) 

                0         1         2         3
foo one  0.770139 -2.205407  0.359475 -0.754134
    two -1.109005 -0.802934  0.874133  0.135057
bar one  0.249768  0.619312  1.851270 -0.593451

Which has two parts:

df.loc[['foo']]

                0         1         2         3
foo one  0.770139 -2.205407  0.359475 -0.754134
    two -1.109005 -0.802934  0.874133  0.135057

and,

df.loc[[('bar', 'one')]]

                0         1        2         3
bar one  0.249768  0.619312  1.85127 -0.593451

The braces around each index are to prevent the level from being dropped during the slicing operation.

cs95
  • 379,657
  • 97
  • 704
  • 746