Note: this looks similar to Pandas get topmost n records within each group, but I would prefer to do this without demoting my multiindex to columns.
Suppose I have a data frame that looks like this:
arrays = [
np.array(["bar", "bar", "bar", "foo", "foo", "foo", "qux", "qux"]),
np.array(["one", "two", "three", "three", "one", "two", "two", "one"]),
]
pd.DataFrame(np.random.randn(8, 4), index=arrays)
I would like to take the top two entries for each level, so my final output will be (looking at the index only, and ignoring the values in the table):
I've looked at the documentation page on multi-indexing (https://pandas.pydata.org/docs/user_guide/advanced.html), but I can't see anything that does what I'm asking for. All the slicing examples using :
are for loc
, which I can't use, because my levels are not sorted and I don't know in advance what they will be.
Syntactically, what I'm trying to do is something like:
idx = pd.IndexSlice
df.iloc[idx[0:3, 0:3], :]
... which works for loc
(if the index is lexsorted), but not for iloc
.