6

I have a dataframe that looks like below

ID      DATE          PROFIT
2342  2017-03-01       457
2342  2017-06-01       658
2342  2017-09-01       3456
2342  2017-12-01       345
2342  2018-03-01       235
2342  2018-06-01       23
808   2016-12-01       200        
808   2017-03-01       9346
808   2017-06-01       54
808   2017-09-01       314
808   2017-12-01       57
....
....

For each ID:

I want to find out if the Profit has stayed between 200 and 1000. I want to do it in such a way that a counter( a new column) indicates how many quarters (latest and previous) in succession have satisfied this condition. If for some reason, one of the intermediate quarters does not match the condition, the counter should reset.

So the output should look something like :

ID      DATE          PROFIT    COUNTER
2342  2017-03-01       457        1
2342  2017-06-01       658        2
2342  2017-09-01       3456       0
2342  2017-12-01       345        1
2342  2018-03-01       235        2
2342  2018-06-01       23         0
808   2016-12-01       200        1
808   2017-03-01       9346       0
808   2017-06-01       54         0
808   2017-09-01       314        1
808   2017-12-01       57         0
....
....

I am thinking of using the shift functionality to access/condition on the previous rows, however if there is a better way to check if condition in datetime values, it will be good to know.

asimo
  • 2,340
  • 11
  • 29

5 Answers5

3

IIUC Create the help key by using cumsum , then we just need to filter before assign back and fillna which is not between 200 to 1000 as 0

s=(~df.PROFIT.between(200,1000)).groupby(df['ID']).cumsum()
df['COUNTER']=df[df.PROFIT.between(200,1000)].groupby([df.ID,s]).cumcount()+1
df.COUNTER.fillna(0,inplace=True)
df
Out[226]: 
      ID        DATE  PROFIT  COUNTER
0   2342  2017-03-01     457      1.0
1   2342  2017-06-01     658      2.0
2   2342  2017-09-01    3456      0.0
3   2342  2017-12-01     345      1.0
4   2342  2018-03-01     235      2.0
5   2342  2018-06-01      23      0.0
6    808  2016-12-01     200      1.0
7    808  2017-03-01    9346      0.0
8    808  2017-06-01      54      0.0
9    808  2017-09-01     314      1.0
10   808  2017-12-01      57      0.0
BENY
  • 317,841
  • 20
  • 164
  • 234
2

Set up a criteria column with value 1 meets criteria, then group and sum.

df['criteria'] = 0

df.loc[(df['PROFIT'] >= 200) & (df['PROFIT'] <= 1000), 'criteria'] = 1

df['result'] = df.groupby(['ID', df.criteria.eq(0).cumsum()])['criteria'].cumsum()


     ID        DATE  PROFIT  criteria  result
0   2342  2017-03-01     457         1       1
1   2342  2017-06-01     658         1       2
2   2342  2017-09-01    3456         0       0
3   2342  2017-12-01     345         1       1
4   2342  2018-03-01     235         1       2
5   2342  2018-06-01      23         0       0
6    808  2016-12-01     200         1       1
7    808  2017-03-01    9346         0       0
8    808  2017-06-01      54         0       0
9    808  2017-09-01     314         1       1
10   808  2017-12-01      57         0       0
run-out
  • 3,114
  • 1
  • 9
  • 25
0

Use groupby with a cumsum and a cumcount, then simply use loc, to get first rows and make them as you want them:

df['BOOL'] = (~df['PROFIT'].between(200, 1000)).cumsum()
df['COUNTER'] = df.groupby('BOOL', 'ID']).cumcount()
df.loc[df.groupby('ID', as_index=False)['BOOL'].apply(lambda x: x.loc[:x.idxmin()-1]).index.levels[1], 'COUNTER'] += 1

And now:

print(df)

Is:

      ID        DATE  PROFIT  COUNTER
0   2342  2017-03-01     457        1
1   2342  2017-06-01     658        2
2   2342  2017-09-01    3456        0
3   2342  2017-12-01     345        1
4   2342  2018-03-01     235        2
5   2342  2018-06-01      23        0
6    808  2016-12-01     200        1
7    808  2017-03-01    9346        0
8    808  2017-06-01      54        0
9    808  2017-09-01     314        1
10   808  2017-12-01      57        0

As you shown in the desired output.

U13-Forward
  • 69,221
  • 14
  • 89
  • 114
  • The output looks right, but just curious how/where the ID column is getting grouped in your script? – asimo May 01 '19 at 01:08
  • @asimo but still, you don't really need it here, since the last row of the `2342` `ID` group is smaller than `200` (not in the range) – U13-Forward May 01 '19 at 01:09
  • In the sample df I pasted, it is not needed, but in the question, I have mentioned at the start...That this should happen for each ID – asimo May 01 '19 at 01:10
  • @asimo Sorry, but i have to leave, so i can't edit... maybe in 4 hours i can start to try again. sorry for the inconvenience :-) – U13-Forward May 01 '19 at 01:41
  • @asimo Back on computer, more than double the time of what i thougth, not a lot of time tho – U13-Forward May 01 '19 at 12:42
0
def magic(y):
    return y * (y.groupby((y != y.shift()).cumsum()).cumcount() + 1)

data["condition"] = data['PROFIT'].between(200, 1000)
data["COUNTER"] = data.groupby("ID").condition.apply(magic)


      ID        DATE  PROFIT  condition  COUNTER
0   2342  2017-03-01     457       True        1
1   2342  2017-06-01     658       True        2
2   2342  2017-09-01    3456      False        0
3   2342  2017-12-01     345       True        1
4   2342  2018-03-01     235       True        2
5   2342  2018-06-01      23      False        0
6    808  2016-12-01     200       True        1
7    808  2017-03-01    9346      False        0
8    808  2017-06-01      54      False        0
9    808  2017-09-01     314       True        1
10   808  2017-12-01      57      False        0
Dawei
  • 1,046
  • 12
  • 21
-1

Wouldn't something as simple as the following work?

if profit_value>200 and profit_value<1000:
   cntr+=1
else:
   cntr=0
BenI
  • 162
  • 5