22

If I have a pandas.core.series.Series named ts of either 1's or NaN's like this:

3382   NaN
3381   NaN
...
3369   NaN
3368   NaN
...
15     1
10   NaN
11     1
12     1
13     1
9    NaN
8    NaN
7    NaN
6    NaN
3    NaN
4      1
5      1
2    NaN
1    NaN
0    NaN

I would like to calculate cumsum of this serie but it should be reset (set to zero) at the location of the NaNs like below:

3382   0
3381   0
...
3369   0
3368   0
...
15     1
10     0
11     1
12     2
13     3
9      0
8      0
7      0
6      0
3      0
4      1
5      2
2      0
1      0
0      0

Ideally I would like to have a vectorized solution !

I ever see a similar question with Matlab : Matlab cumsum reset at NaN?

but I don't know how to translate this line d = diff([0 c(n)]);

Community
  • 1
  • 1
working4coins
  • 1,997
  • 3
  • 22
  • 30

5 Answers5

14

Even more pandas-onic way to do it:

v = pd.Series([1., 3., 1., np.nan, 1., 1., 1., 1., np.nan, 1.])
cumsum = v.cumsum().fillna(method='pad')
reset = -cumsum[v.isnull()].diff().fillna(cumsum)
result = v.where(v.notnull(), reset).cumsum()

Contrary to the matlab code, this also works for values different from 1.

kadee
  • 8,067
  • 1
  • 39
  • 31
  • 2
    This is the best answer of the lot. If you want to understand how it works, just add one more line at the end: `print(pd.DataFrame({'v': v, 'cum': cumsum, 'reset': reset, 'result': result}))`, and run this code. – Roobie Nuby Nov 13 '18 at 10:10
13

A simple Numpy translation of your Matlab code is this:

import numpy as np

v = np.array([1., 1., 1., np.nan, 1., 1., 1., 1., np.nan, 1.])
n = np.isnan(v)
a = ~n
c = np.cumsum(a)
d = np.diff(np.concatenate(([0.], c[n])))
v[n] = -d
np.cumsum(v)

Executing this code returns the result array([ 1., 2., 3., 0., 1., 2., 3., 4., 0., 1.]). This solution will only be as valid as the original one, but maybe it will help you come up with something better if it isn't sufficient for your purposes.

emprice
  • 912
  • 11
  • 21
10

Here's a slightly more pandas-onic way to do it:

v = Series([1, 1, 1, nan, 1, 1, 1, 1, nan, 1], dtype=float)
n = v.isnull()
a = ~n
c = a.cumsum()
index = c[n].index  # need the index for reconstruction after the np.diff
d = Series(np.diff(np.hstack(([0.], c[n]))), index=index)
v[n] = -d
result = v.cumsum()

Note that either of these requires that you're using pandas at least at 9da899b or newer. If you aren't then you can cast the bool dtype to an int64 or float64 dtype:

v = Series([1, 1, 1, nan, 1, 1, 1, 1, nan, 1], dtype=float)
n = v.isnull()
a = ~n
c = a.astype(float).cumsum()
index = c[n].index  # need the index for reconstruction after the np.diff
d = Series(np.diff(np.hstack(([0.], c[n]))), index=index)
v[n] = -d
result = v.cumsum()
Phillip Cloud
  • 24,919
  • 11
  • 68
  • 88
  • `ValueError: cannot convert float NaN to integer` for `ts.notnull.cumsum()` on pandas 0.12. I'm not sure why this would occur for a boolean series.. – machow Aug 12 '13 at 22:04
  • That should've been fixed by [`9da899b`](https://github.com/pydata/pandas/commit/9da899ba3d1099d7456adb32ea129547f152dee8) – Phillip Cloud Aug 12 '13 at 22:08
  • @Closed Make sure you're up to date and let me know if it still doesn't work. – Phillip Cloud Aug 12 '13 at 22:08
  • @Closed I've updated my answer for usage pre [`9da899b`](https://github.com/pydata/pandas/commit/9da899ba3d1099d7456adb32ea129547f152dee8). – Phillip Cloud Aug 12 '13 at 22:12
  • Thanks for your answer. @nosuchthingasstars 's answer is marked as solving this issue... but I also like your answer! You should write `ts = pd.Series(np.random.randint(10, size=1000), dtype=float)` – working4coins Aug 13 '13 at 06:40
  • I think there is a mistake `import pandas as pd import numpy as np #ts = pd.Series(np.random.randint(10, size=1000), dtype=float) #ts[np.random.rand(len(ts)) > 0.5] = np.nan ts = pd.Series([1,1,1,np.nan,np.nan,1,1]) valid = ts.notnull() #c = valid.cumsum() # with pd>=0.12 c = valid.astype(int).cumsum() # because of Pandas bug with bool and cumsum for pd<0.12 d = c.diff().fillna(0) ts[~valid] = d` because `ts= 1 1 1 0 0 1 1` it should output `1 2 3 0 0 1 2` – working4coins Aug 13 '13 at 07:40
  • @working4coins I've updated my answer to give the correct answer. – Phillip Cloud Aug 13 '13 at 16:08
  • Thanks but as you can see, it's very close from nosuchthingasstars solution. – working4coins Aug 13 '13 at 19:08
  • Will that answer maintain the index? – Phillip Cloud Aug 13 '13 at 20:36
6

If you can accept a similar boolean Series b, try

(b.cumsum() - b.cumsum().where(~b).fillna(method='pad').fillna(0)).astype(int)

Starting from your Series ts, either b = (ts == 1) or b = ~ts.isnull().

Adam Fuller
  • 183
  • 2
  • 9
1

You can do that with expanding().apply and replace with method='backfill'

reset_at = 0

ts.expanding().apply(
    lambda s:
        s[
            (s != reset_at).replace(True, method='backfill')
        ].sum()
).fillna(0)