Cumsum reset at NaN

Question

If I have a pandas.core.series.Series named ts of either 1's or NaN's like this:

3382   NaN
3381   NaN
...
3369   NaN
3368   NaN
...
15     1
10   NaN
11     1
12     1
13     1
9    NaN
8    NaN
7    NaN
6    NaN
3    NaN
4      1
5      1
2    NaN
1    NaN
0    NaN

I would like to calculate cumsum of this serie but it should be reset (set to zero) at the location of the NaNs like below:

Ideally I would like to have a vectorized solution !

I ever see a similar question with Matlab : Matlab cumsum reset at NaN?

but I don't know how to translate this line d = diff([0 c(n)]);

kadee · Answer 1 · 2016-04-05T20:53:34.277

14

Even more pandas-onic way to do it:

v = pd.Series([1., 3., 1., np.nan, 1., 1., 1., 1., np.nan, 1.])
cumsum = v.cumsum().fillna(method='pad')
reset = -cumsum[v.isnull()].diff().fillna(cumsum)
result = v.where(v.notnull(), reset).cumsum()

Contrary to the matlab code, this also works for values different from 1.

edited Apr 05 '16 at 20:53

answered Apr 05 '16 at 20:16

kadee

8,067
1
39
31

2

This is the best answer of the lot. If you want to understand how it works, just add one more line at the end: `print(pd.DataFrame({'v': v, 'cum': cumsum, 'reset': reset, 'result': result}))`, and run this code. – Roobie Nuby Nov 13 '18 at 10:10

score 13 · Accepted Answer · answered Aug 12 '13 at 21:46

13

A simple Numpy translation of your Matlab code is this:

import numpy as np

v = np.array([1., 1., 1., np.nan, 1., 1., 1., 1., np.nan, 1.])
n = np.isnan(v)
a = ~n
c = np.cumsum(a)
d = np.diff(np.concatenate(([0.], c[n])))
v[n] = -d
np.cumsum(v)

Executing this code returns the result array([ 1., 2., 3., 0., 1., 2., 3., 4., 0., 1.]). This solution will only be as valid as the original one, but maybe it will help you come up with something better if it isn't sufficient for your purposes.

answered Aug 12 '13 at 21:46

emprice

912
11
21

it wouldn't work with: v = np.array([1., 2., 4., np.nan, 1., 3., 1., 3., np.nan, 1.]) – Julian Sep 17 '20 at 07:30
2

If you change `a = ~n` to `a = np.nan_to_num(v)`, it also works for v with values other than 1. – Lu Kas Jun 30 '22 at 09:28

Phillip Cloud · Answer 3 · 2013-08-13T16:22:43.323

10

Here's a slightly more pandas-onic way to do it:

v = Series([1, 1, 1, nan, 1, 1, 1, 1, nan, 1], dtype=float)
n = v.isnull()
a = ~n
c = a.cumsum()
index = c[n].index  # need the index for reconstruction after the np.diff
d = Series(np.diff(np.hstack(([0.], c[n]))), index=index)
v[n] = -d
result = v.cumsum()

Note that either of these requires that you're using pandas at least at 9da899b or newer. If you aren't then you can cast the bool dtype to an int64 or float64 dtype:

v = Series([1, 1, 1, nan, 1, 1, 1, 1, nan, 1], dtype=float)
n = v.isnull()
a = ~n
c = a.astype(float).cumsum()
index = c[n].index  # need the index for reconstruction after the np.diff
d = Series(np.diff(np.hstack(([0.], c[n]))), index=index)
v[n] = -d
result = v.cumsum()

edited Aug 13 '13 at 16:22

answered Aug 12 '13 at 21:54

Phillip Cloud

24,919
11
68
88

`ValueError: cannot convert float NaN to integer` for `ts.notnull.cumsum()` on pandas 0.12. I'm not sure why this would occur for a boolean series.. – machow Aug 12 '13 at 22:04
That should've been fixed by [`9da899b`](https://github.com/pydata/pandas/commit/9da899ba3d1099d7456adb32ea129547f152dee8) – Phillip Cloud Aug 12 '13 at 22:08
@Closed Make sure you're up to date and let me know if it still doesn't work. – Phillip Cloud Aug 12 '13 at 22:08
@Closed I've updated my answer for usage pre [`9da899b`](https://github.com/pydata/pandas/commit/9da899ba3d1099d7456adb32ea129547f152dee8). – Phillip Cloud Aug 12 '13 at 22:12
Thanks for your answer. @nosuchthingasstars 's answer is marked as solving this issue... but I also like your answer! You should write `ts = pd.Series(np.random.randint(10, size=1000), dtype=float)` – working4coins Aug 13 '13 at 06:40
I think there is a mistake `import pandas as pd import numpy as np #ts = pd.Series(np.random.randint(10, size=1000), dtype=float) #ts[np.random.rand(len(ts)) > 0.5] = np.nan ts = pd.Series([1,1,1,np.nan,np.nan,1,1]) valid = ts.notnull() #c = valid.cumsum() # with pd>=0.12 c = valid.astype(int).cumsum() # because of Pandas bug with bool and cumsum for pd<0.12 d = c.diff().fillna(0) ts[~valid] = d` because `ts= 1 1 1 0 0 1 1` it should output `1 2 3 0 0 1 2` – working4coins Aug 13 '13 at 07:40
@working4coins I've updated my answer to give the correct answer. – Phillip Cloud Aug 13 '13 at 16:08
Thanks but as you can see, it's very close from nosuchthingasstars solution. – working4coins Aug 13 '13 at 19:08
Will that answer maintain the index? – Phillip Cloud Aug 13 '13 at 20:36

score 6 · Answer 4 · answered Aug 10 '15 at 06:16

6

If you can accept a similar boolean Series b, try

(b.cumsum() - b.cumsum().where(~b).fillna(method='pad').fillna(0)).astype(int)

Starting from your Series ts, either b = (ts == 1) or b = ~ts.isnull().

answered Aug 10 '15 at 06:16

Adam Fuller

183
2
9

Sandro Salles · Answer 5 · 2022-03-25T17:13:52.407

1

You can do that with expanding().apply and replace with method='backfill'

reset_at = 0

ts.expanding().apply(
    lambda s:
        s[
            (s != reset_at).replace(True, method='backfill')
        ].sum()
).fillna(0)

edited Mar 25 '22 at 17:13

answered Mar 25 '22 at 16:03

Sandro Salles

11
2

Cumsum reset at NaN

5 Answers5

Linked