Computing cumulative values for each year of a dataframe separately

Question

I have the foll. pandas dataframe with datetime index:

datetime       VAL
2000-01-01   -283.0000
2000-01-02   -283.0000
2000-01-03    -10.6710
2000-01-04    -12.2700
2000-01-05    -10.7855
2001-01-06     -9.1480
2001-01-07     -9.5300
2001-01-08    -10.4675
2001-01-09    -10.9205
2001-01-10    -11.5715

I would like to compute cumulative values for each year and replace the VAL column by the cumulative values. E.g, It will look something like this:

datetime       VAL
2000-01-01   -283.0000
2000-01-02   -283.0000 + -283.0000
2000-01-03    -10.6710 + -283.0000 + -283.0000
2000-01-04    -12.2700 + -10.6710 + -283.0000 + -283.0000
2000-01-05    -10.7855 + -12.2700 + -10.6710 + -283.0000 + -283.0000
2001-01-06     -9.1480
2001-01-07     -9.5300 + -9.5300
2001-01-08    -10.4675 + -10.4675
2001-01-09    -10.9205 + -10.9205
2001-01-10    -11.5715 + -11.5715

I haven't done the actual calculations which is why you see -283.000 + -283.000 instead of -566.0000

Not sure how to proceed with this, I could do a groupby and then?

score 5 · Accepted Answer · answered Jan 27 '16 at 05:37

5

You can access the year via .year on a DateTimeIndex, and pass that to groupby:

>>> df["cumulative_VAL"] = df.groupby(df.index.year)["VAL"].cumsum()
>>> df
                 VAL  cumulative_VAL
datetime                            
2000-01-01 -283.0000       -283.0000
2000-01-02 -283.0000       -566.0000
2000-01-03  -10.6710       -576.6710
2000-01-04  -12.2700       -588.9410
2000-01-05  -10.7855       -599.7265
2001-01-06   -9.1480         -9.1480
2001-01-07   -9.5300        -18.6780
2001-01-08  -10.4675        -29.1455
2001-01-09  -10.9205        -40.0660
2001-01-10  -11.5715        -51.6375

answered Jan 27 '16 at 05:37

DSM

342,061
65
592
494

thanks @DSM, is there a difference between doing `df.groupby(df.index.year)["VAL"].cumsum()` and `df['VAL'].groupby(df.index.year).cumsum()`? – user308827 Jan 27 '16 at 05:46
1

@user308827: not here, or least not that I can think of offhand. In both cases you've selected the column VAL before you apply any operations. – DSM Jan 27 '16 at 05:50

score 1 · Answer 2 · edited May 23 '17 at 12:07

1

Use numpy.cumsum()

>>> a = np.array([[1,2,3], [4,5,6]])
>>> a array([[1, 2, 3],
       [4, 5, 6]])
>>> np.cumsum(a) array([ 1,  3,  6, 10, 15, 21])
>>> np.cumsum(a, dtype=float)     # specifies type of output value(s) array([  1.,   3.,   6.,  10.,  15.,  21.])

http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.cumsum.html

To groupby year, you can use:

data.groupby(data['datetime'].map(lambda x: x.year))

How to group pandas DataFrame entries by date in a non-unique column

edited May 23 '17 at 12:07

Community

1
1

answered Jan 27 '16 at 05:23

drum

5,416
7
57
91

thanks @drum, cumsum works but how to use it along with groupby? – user308827 Jan 27 '16 at 05:24

Computing cumulative values for each year of a dataframe separately

2 Answers2