I have read related questions like this one and this blog post.
Unfortunately I am unable to modify the solutions to my needs.
Consider a Series with a DatetimeIndex which may look like this:
Code to instantiate an example:
s = pd.Series([0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0], index=pd.date_range(start=0, freq='1d', periods=12, name='A')
Ultimately, I want to get the result
(t4 - t1)
+ (t8 - t5)
+ (t10 - t8)
This means I need to identify streaks of 1
padded with 0
on each side. I can do everything after that myself, i.e. grouping by streak (possibly with cumcount
) and diffing the first and last timestamp in each group.
There are some special cases when the Series starts/ends with a 1
.In this case I want to treat it as if it was preceded/followed by a 0
at the same timestamp, e.g.
Attempt so far:
I'm going to concat some sub-solutions for easier visualization.
Pad the series with a zero on each end, to avoid special cases.
s = pd.Series([0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0], index=pd.date_range(start=0, freq='1d', periods=12, name='A') s = pd.concat([pd.Series([0], index=s.index[:1]), s, pd.Series([0], index=s.index[-1:])])
Get the last
0
before and the first0
after a streak of ones.>>> tmp = pd.concat([s, s.diff(-1).eq(-1).astype(int).rename('starter'), s.diff(1).eq(-1).astype(int).rename('ender')], axis=1) >>> tmp A starter ender 1970-01-01 0 0 0 1970-01-02 0 1 0 1970-01-03 1 0 0 1970-01-04 1 0 0 1970-01-05 0 0 1 1970-01-06 0 1 0 1970-01-07 1 0 0 1970-01-08 1 0 0 1970-01-09 0 1 1 1970-01-10 1 0 0 1970-01-11 0 0 1 1970-01-12 0 0 0
Fill single zero gaps in the
'A'
column with1
because they don't change the desired result. (This step might not be necessary but helps the visualization.)>>> tmp.loc[(both := tmp['starter'].eq(1) & tmp['ender'].eq(1)), 'A'] = 1 >>> tmp A starter ender 1970-01-01 0 0 0 1970-01-02 0 1 0 1970-01-03 1 0 0 1970-01-04 1 0 0 1970-01-05 0 0 1 1970-01-06 0 1 0 1970-01-07 1 0 0 1970-01-08 1 0 0 1970-01-09 1 1 1 1970-01-10 1 0 0 1970-01-11 0 0 1 1970-01-12 0 0 0
Adjust the
'starter'
and'ender'
columns.>>> tmp.loc[both, ['starter', 'ender']] = 0 >>> tmp A starter ender 1970-01-01 0 0 0 1970-01-02 0 1 0 1970-01-03 1 0 0 1970-01-04 1 0 0 1970-01-05 0 0 1 1970-01-06 0 1 0 1970-01-07 1 0 0 1970-01-08 1 0 0 1970-01-09 1 0 0 1970-01-10 1 0 0 1970-01-11 0 0 1 1970-01-12 0 0 0
And this is where I'm stuck.