Is there a way to apply a function to a MultiIndex dataframe slice with the same outer index without iterating each slice?

Question

Basically, what I'm trying to accomplish is to fill the missing dates (creating new DataFrame rows) with respect to each product, then create a new column based on a cumulative sum of column 'A' (example shown below)

The data is a MultiIndex with (product, date) as indexes.

Basically I would like to apply this answer to a MultiIndex DataFrame using only the rightmost index and calculating a subsequent np.cumsum for each product (and all dates).

                    A
product    date 
0       2017-01-02  1
        2017-01-03  2
        2017-01-04  2
        2017-01-05  1
        2017-01-06  4
        2017-01-07  1
        2017-01-10  7
1       2018-06-29  1
        2018-06-30  4
        2018-07-01  1
        2018-07-02  1
        2018-07-04  2

What I want to accomplish (efficiently) is:

                    A      CumSum
product    date 
0       2017-01-02  1        1
        2017-01-03  2        3
        2017-01-04  2        5
        2017-01-05  1        6
        2017-01-06  4        10
        2017-01-07  1        11
        2017-01-08  0        11
        2017-01-09  0        11
        2017-01-10  7        18
1       2018-06-29  1        1
        2018-06-30  4        5
        2018-07-01  1        6
        2018-07-02  1        7
        2018-07-03  0        7
        2018-07-04  2        9

try `df.groupby(level=0).A.cumsum()` – Mark Wang Sep 02 '19 at 22:09 — Mark Wang, Sep 02 '19 at 22:09

score 1 · Accepted Answer · answered Sep 02 '19 at 23:18

You have 2 ways:

One way:
Using groupby with apply and with resample and cumsum. Finally, pd.concat result with df.A and fillna with 0

s = (df.reset_index(0).groupby('product').apply(lambda x: x.resample(rule='D')
                                                           .asfreq(0).A.cumsum()))
pd.concat([df.A, s.rename('cumsum')], axis=1).fillna(0)

Out[337]:
                      A  cumsum
product date
0       2017-01-02  1.0       1
        2017-01-03  2.0       3
        2017-01-04  2.0       5
        2017-01-05  1.0       6
        2017-01-06  4.0      10
        2017-01-07  1.0      11
        2017-01-08  0.0      11
        2017-01-09  0.0      11
        2017-01-10  7.0      18
1       2018-06-29  1.0       1
        2018-06-30  4.0       5
        2018-07-01  1.0       6
        2018-07-02  1.0       7
        2018-07-03  0.0       7
        2018-07-04  2.0       9

Another way:
you need 2 groupbys. First one for resample, 2nd one for cumsum. Finally, use pd.concat and fillna with 0

s1 = df.reset_index(0).groupby('product').resample(rule='D').asfreq(0).A
pd.concat([df.A, s1.groupby(level=0).cumsum().rename('cumsum')], axis=1).fillna(0)

Out[351]:
                      A  cumsum
product date
0       2017-01-02  1.0       1
        2017-01-03  2.0       3
        2017-01-04  2.0       5
        2017-01-05  1.0       6
        2017-01-06  4.0      10
        2017-01-07  1.0      11
        2017-01-08  0.0      11
        2017-01-09  0.0      11
        2017-01-10  7.0      18
1       2018-06-29  1.0       1
        2018-06-30  4.0       5
        2018-07-01  1.0       6
        2018-07-02  1.0       7
        2018-07-03  0.0       7
        2018-07-04  2.0       9

Is there a way to apply a function to a MultiIndex dataframe slice with the same outer index without iterating each slice?

1 Answers1