1

Basically, what I'm trying to accomplish is to fill the missing dates (creating new DataFrame rows) with respect to each product, then create a new column based on a cumulative sum of column 'A' (example shown below)

The data is a MultiIndex with (product, date) as indexes.

Basically I would like to apply this answer to a MultiIndex DataFrame using only the rightmost index and calculating a subsequent np.cumsum for each product (and all dates).

                    A
product    date 
0       2017-01-02  1
        2017-01-03  2
        2017-01-04  2
        2017-01-05  1
        2017-01-06  4
        2017-01-07  1
        2017-01-10  7
1       2018-06-29  1
        2018-06-30  4
        2018-07-01  1
        2018-07-02  1
        2018-07-04  2

What I want to accomplish (efficiently) is:

                    A      CumSum
product    date 
0       2017-01-02  1        1
        2017-01-03  2        3
        2017-01-04  2        5
        2017-01-05  1        6
        2017-01-06  4        10
        2017-01-07  1        11
        2017-01-08  0        11
        2017-01-09  0        11
        2017-01-10  7        18
1       2018-06-29  1        1
        2018-06-30  4        5
        2018-07-01  1        6
        2018-07-02  1        7
        2018-07-03  0        7
        2018-07-04  2        9

1 Answers1

1

You have 2 ways:

One way:
Using groupby with apply and with resample and cumsum. Finally, pd.concat result with df.A and fillna with 0

s = (df.reset_index(0).groupby('product').apply(lambda x: x.resample(rule='D')
                                                           .asfreq(0).A.cumsum()))
pd.concat([df.A, s.rename('cumsum')], axis=1).fillna(0)

Out[337]:
                      A  cumsum
product date
0       2017-01-02  1.0       1
        2017-01-03  2.0       3
        2017-01-04  2.0       5
        2017-01-05  1.0       6
        2017-01-06  4.0      10
        2017-01-07  1.0      11
        2017-01-08  0.0      11
        2017-01-09  0.0      11
        2017-01-10  7.0      18
1       2018-06-29  1.0       1
        2018-06-30  4.0       5
        2018-07-01  1.0       6
        2018-07-02  1.0       7
        2018-07-03  0.0       7
        2018-07-04  2.0       9

Another way:
you need 2 groupbys. First one for resample, 2nd one for cumsum. Finally, use pd.concat and fillna with 0

s1 = df.reset_index(0).groupby('product').resample(rule='D').asfreq(0).A
pd.concat([df.A, s1.groupby(level=0).cumsum().rename('cumsum')], axis=1).fillna(0)

Out[351]:
                      A  cumsum
product date
0       2017-01-02  1.0       1
        2017-01-03  2.0       3
        2017-01-04  2.0       5
        2017-01-05  1.0       6
        2017-01-06  4.0      10
        2017-01-07  1.0      11
        2017-01-08  0.0      11
        2017-01-09  0.0      11
        2017-01-10  7.0      18
1       2018-06-29  1.0       1
        2018-06-30  4.0       5
        2018-07-01  1.0       6
        2018-07-02  1.0       7
        2018-07-03  0.0       7
        2018-07-04  2.0       9
Andy L.
  • 24,909
  • 4
  • 17
  • 29