Incremental Counter flag for a matching condition on subsequent time series data

Question

I have a dataframe that looks like below

ID      DATE          PROFIT
2342  2017-03-01       457
2342  2017-06-01       658
2342  2017-09-01       3456
2342  2017-12-01       345
2342  2018-03-01       235
2342  2018-06-01       23
808   2016-12-01       200        
808   2017-03-01       9346
808   2017-06-01       54
808   2017-09-01       314
808   2017-12-01       57
....
....

For each ID:

I want to find out if the Profit has stayed between 200 and 1000. I want to do it in such a way that a counter( a new column) indicates how many quarters (latest and previous) in succession have satisfied this condition. If for some reason, one of the intermediate quarters does not match the condition, the counter should reset.

So the output should look something like :

ID      DATE          PROFIT    COUNTER
2342  2017-03-01       457        1
2342  2017-06-01       658        2
2342  2017-09-01       3456       0
2342  2017-12-01       345        1
2342  2018-03-01       235        2
2342  2018-06-01       23         0
808   2016-12-01       200        1
808   2017-03-01       9346       0
808   2017-06-01       54         0
808   2017-09-01       314        1
808   2017-12-01       57         0
....
....

I am thinking of using the shift functionality to access/condition on the previous rows, however if there is a better way to check if condition in datetime values, it will be good to know.

@U9-Forward Thanks for pointing out; it was a typo. I have edited it now — asimo, May 01 '19 at 00:52
This answer seems helps. https://stackoverflow.com/questions/27626542/counting-consecutive-positive-value-in-python-array — Dawei, May 01 '19 at 00:55
Thanks @Dawei, I had a look, its good for a start, but I also need to factor in the conditions (lt and gt). A pythonic way of handling this condition on timeseries data is what am looking for — asimo, May 01 '19 at 01:00

score 3 · Answer 1 · answered May 01 '19 at 01:18

IIUC Create the help key by using cumsum , then we just need to filter before assign back and fillna which is not between 200 to 1000 as 0

s=(~df.PROFIT.between(200,1000)).groupby(df['ID']).cumsum()
df['COUNTER']=df[df.PROFIT.between(200,1000)].groupby([df.ID,s]).cumcount()+1
df.COUNTER.fillna(0,inplace=True)
df
Out[226]: 
      ID        DATE  PROFIT  COUNTER
0   2342  2017-03-01     457      1.0
1   2342  2017-06-01     658      2.0
2   2342  2017-09-01    3456      0.0
3   2342  2017-12-01     345      1.0
4   2342  2018-03-01     235      2.0
5   2342  2018-06-01      23      0.0
6    808  2016-12-01     200      1.0
7    808  2017-03-01    9346      0.0
8    808  2017-06-01      54      0.0
9    808  2017-09-01     314      1.0
10   808  2017-12-01      57      0.0

score 2 · Answer 2 · answered May 01 '19 at 01:12

Set up a criteria column with value 1 meets criteria, then group and sum.

df['criteria'] = 0

df.loc[(df['PROFIT'] >= 200) & (df['PROFIT'] <= 1000), 'criteria'] = 1

df['result'] = df.groupby(['ID', df.criteria.eq(0).cumsum()])['criteria'].cumsum()


     ID        DATE  PROFIT  criteria  result
0   2342  2017-03-01     457         1       1
1   2342  2017-06-01     658         1       2
2   2342  2017-09-01    3456         0       0
3   2342  2017-12-01     345         1       1
4   2342  2018-03-01     235         1       2
5   2342  2018-06-01      23         0       0
6    808  2016-12-01     200         1       1
7    808  2017-03-01    9346         0       0
8    808  2017-06-01      54         0       0
9    808  2017-09-01     314         1       1
10   808  2017-12-01      57         0       0

U13-Forward · Answer 3 · 2019-05-01T23:45:25.217

0

Use groupby with a cumsum and a cumcount, then simply use loc, to get first rows and make them as you want them:

df['BOOL'] = (~df['PROFIT'].between(200, 1000)).cumsum()
df['COUNTER'] = df.groupby('BOOL', 'ID']).cumcount()
df.loc[df.groupby('ID', as_index=False)['BOOL'].apply(lambda x: x.loc[:x.idxmin()-1]).index.levels[1], 'COUNTER'] += 1

And now:

print(df)

Is:

      ID        DATE  PROFIT  COUNTER
0   2342  2017-03-01     457        1
1   2342  2017-06-01     658        2
2   2342  2017-09-01    3456        0
3   2342  2017-12-01     345        1
4   2342  2018-03-01     235        2
5   2342  2018-06-01      23        0
6    808  2016-12-01     200        1
7    808  2017-03-01    9346        0
8    808  2017-06-01      54        0
9    808  2017-09-01     314        1
10   808  2017-12-01      57        0

As you shown in the desired output.

edited May 01 '19 at 23:45

answered May 01 '19 at 01:04

U13-Forward

69,221
14
89
114

The output looks right, but just curious how/where the ID column is getting grouped in your script? – asimo May 01 '19 at 01:08
@asimo but still, you don't really need it here, since the last row of the `2342` `ID` group is smaller than `200` (not in the range) – U13-Forward May 01 '19 at 01:09
In the sample df I pasted, it is not needed, but in the question, I have mentioned at the start...That this should happen for each ID – asimo May 01 '19 at 01:10
@asimo Sorry, but i have to leave, so i can't edit... maybe in 4 hours i can start to try again. sorry for the inconvenience :-) – U13-Forward May 01 '19 at 01:41
@asimo Back on computer, more than double the time of what i thougth, not a lot of time tho – U13-Forward May 01 '19 at 12:42

Dawei · Answer 4 · 2019-05-01T01:33:22.777

def magic(y):
    return y * (y.groupby((y != y.shift()).cumsum()).cumcount() + 1)

data["condition"] = data['PROFIT'].between(200, 1000)
data["COUNTER"] = data.groupby("ID").condition.apply(magic)


      ID        DATE  PROFIT  condition  COUNTER
0   2342  2017-03-01     457       True        1
1   2342  2017-06-01     658       True        2
2   2342  2017-09-01    3456      False        0
3   2342  2017-12-01     345       True        1
4   2342  2018-03-01     235       True        2
5   2342  2018-06-01      23      False        0
6    808  2016-12-01     200       True        1
7    808  2017-03-01    9346      False        0
8    808  2017-06-01      54      False        0
9    808  2017-09-01     314       True        1
10   808  2017-12-01      57      False        0

score -1 · Answer 5 · answered May 01 '19 at 00:51

-1

Wouldn't something as simple as the following work?

if profit_value>200 and profit_value<1000:
   cntr+=1
else:
   cntr=0

answered May 01 '19 at 00:51

BenI

162
5

1

Eeek!, loop in pandas? – U13-Forward May 01 '19 at 00:52
It needs to be grouped by each ID Counter has to be a pandas column TimeData needs to be factored in – asimo May 01 '19 at 00:53

Incremental Counter flag for a matching condition on subsequent time series data

5 Answers5