0

Note: this looks similar to Pandas get topmost n records within each group, but I would prefer to do this without demoting my multiindex to columns.

Suppose I have a data frame that looks like this:

arrays = [
    np.array(["bar", "bar", "bar",   "foo",   "foo", "foo", "qux", "qux"]),
    np.array(["one", "two", "three", "three", "one", "two", "two", "one"]),
]
pd.DataFrame(np.random.randn(8, 4), index=arrays)

enter image description here

I would like to take the top two entries for each level, so my final output will be (looking at the index only, and ignoring the values in the table):

enter image description here

I've looked at the documentation page on multi-indexing (https://pandas.pydata.org/docs/user_guide/advanced.html), but I can't see anything that does what I'm asking for. All the slicing examples using : are for loc, which I can't use, because my levels are not sorted and I don't know in advance what they will be.

Syntactically, what I'm trying to do is something like:

idx = pd.IndexSlice
df.iloc[idx[0:3, 0:3], :]

... which works for loc (if the index is lexsorted), but not for iloc.

butterflyknife
  • 1,438
  • 8
  • 17
  • 1
    You can't do this as the `iloc` numbering is relative to the full DataFrame, not per groups. Use `df.groupby(level=0).head(2)`, or `df.groupby(level=0, group_keys=False).apply(lambda d: d.iloc[0:2])` (less elegant and efficient) – mozway Mar 15 '23 at 12:04

0 Answers0