Elegant way to sum over duplicate MultiIndex values

Question

I have a DataFrame with a many-levelled MultiIndex.

I know that there are duplicates in the MultiIndex (because I don't care about a distinction that the underlying databse does care about)

I want to sum over these duplicates:

>>> x = pd.DataFrame({'month':['Sep', 'Sep', 'Oct', 'Oct'], 'day':['Mon', 'Mon', 'Mon', 'Tue'], 'sales':[1,2,3,4]})
>>> x
   day month  sales
0  Mon   Sep      1
1  Mon   Sep      2
2  Mon   Oct      3
3  Tue   Oct      4
>>> x = x.set_index(['day', 'month'])
           sales
day month       
Mon Sep        1
    Sep        2
    Oct        3
Tue Oct        4

To give me

day month       
Mon Sep        3
    Oct        3
Tue Oct        4

Buried deep in this SO answer to a similar question is the suggestion:

df.groupby(level=df.index.names).sum()

But this seems to me to fail the 'readability counts' criterion of good Python code.

Does anyone know of a more human-readable way?

I want all the levels, since I'm really looking for duplicates. — LondonRob, Aug 19 '14 at 16:43
pls show an example then, as you normally won't have duplicates with a multi-index, so show construction. — Jeff, Aug 19 '14 at 16:47
It might make sense to allow something like ``df.sum(level='all')``, which I guess is unambiguous if no levels are named 'all'. what do you think? — Jeff, Aug 19 '14 at 17:38
Isn't this a more generic use-case than adding something to `sum`? Might I not also want to take the `max` and `min`? In which case, it might be more like `df.groupby(level='all')`. — LondonRob, Aug 19 '14 at 17:52
of course, nothing to do with sum specifically (and ``df.sum(level=....)`` actually does a groupby), it has to do with handling ``level='all'`` translating to all levels. — Jeff, Aug 19 '14 at 17:59
By the way, a simple way to avoid this duplicates business is to just have another arbitrary level of the index that deduplicates with `0, 1, 2...` — U2EF1, Aug 20 '14 at 04:33
@Jeff : `all` is an okayish name for a column (and hence for a level). How about something along the lines of `level=[]`? Me personally, I'd prefer inconsistency with `sum()` [maybe make `=all` deprecated after all] over breaking code if users use `all` as a column name (especially since that's not a magic word). — FooBar, Aug 20 '14 at 11:23
yeh, using an empty list ``[]`` would prob be better. Pls open an issue for this suggestion if you'd like (with an example) — Jeff, Aug 20 '14 at 11:54

Elegant way to sum over duplicate MultiIndex values

0 Answers0