First time group meets condition in pandas DataFrame

Question

I have the following pandas.DataFrame:

	id	year	x
0	01001	2015	0
1	01001	2016	0.5
2	01001	2017	0
3	01001	2018	0
4	01002	2015	0
5	01002	2016	0
6	01002	2017	0.0667525
7	01002	2018	0.133505

My goal is to create a new column and fill each window with 1's one row after x > 0 for the first time and fill all preceding rows with 0. That is:

	id	year	x	y
0	01001	2015	0	0
1	01001	2016	0.5	0
2	01001	2017	0	1
3	01001	2018	0	1
4	01002	2015	0	0
5	01002	2016	0	0
6	01002	2017	0.0667525	0
7	01002	2018	0.133505	1

How can this be done?

This is what I came up with: Get each group's index the first time x > 0 and fill each window with 1 from that index until the end of the partition. Then, take these indices and replace their rows with 0.

Here's a replicable example of my data:

t = pd.DataFrame({'id':{0:'01001',1:'01001',2:'01001',3:'01001',4:'01002',5:'01002',6:'01002',7:'01002'},
                  'x':{0:0.0,1:0.5,2:0,3:0,4:0.0,
                        5:0.0,6:0.06675245612859726,7:0.13350491651818122},
                  'year':{0:2015,1:2016,2:2017,3:2018,4:2015,5:2016,6:2017,7:2018}})
t

Thanks for the edit, have you tried `df['y'] = df.groupby('id', as_index=False).apply(lambda p: (p['x'].shift().gt(0)).astype(int)).droplevel(0, axis=0)`? — wwnde, Oct 16 '21 at 01:00
Ah! So close! I hope this last edit showcases my problem better. — Arturo Sbr, Oct 16 '21 at 01:04
Yap, use cumsum, `df['y'] = df.groupby('id', as_index=False).apply(lambda p: (p['x'].shift().gt(0)).cumsum()).droplevel(0, axis=0)` — wwnde, Oct 16 '21 at 01:09

score 2 · Answer 1 · edited Oct 16 '21 at 01:27

2

Let us try transform:

df['y'] = (df.index > (df['x']>0).groupby(df['id']).transform('idxmax')).astype(int)
df
      id  year         x  out
 0  1001  2015  0.000000    0
 1  1001  2016  0.500000    0
 2  1001  2017  0.000000    1
 3  1001  2018  0.000000    1
 4  1002  2015  0.000000    0
 5  1002  2016  0.000000    0
 6  1002  2017  0.066753    0
 7  1002  2018  0.133505    1

edited Oct 16 '21 at 01:27

Arturo Sbr

5,567
4
38
76

answered Oct 16 '21 at 01:06

BENY

317,841
20
164
234

Hi Beny. Could you explain how `transform('idxmax')` works? – Arturo Sbr Oct 16 '21 at 01:18
@ArturoSbr get the first position of value who more than 0 for all groupby each item , then you can compare their index ~ – BENY Oct 16 '21 at 01:29

wwnde · Accepted Answer · 2021-10-16T01:10:16.310

1

What about checking consecutive occurrence conditionally?

df['y'] = df.groupby('id', as_index=False).apply(lambda p: (p['x'].shift().gt(0)).cumsum()).droplevel(0, axis=0)



    id  year         x   y
0  1001  2015  0.000000  0
1  1001  2016  0.500000  0
2  1001  2017  0.000000  1
3  1001  2018  0.000000  1
4  1002  2015  0.000000  0
5  1002  2016  0.000000  0
6  1002  2017  0.066753  0
7  1002  2018  0.133505  1

edited Oct 16 '21 at 01:10

answered Oct 16 '21 at 00:34

wwnde

26,119
6
18
32

This is actually really clever. Can you do the same grouping by `id`? In its current form, I don't think this would work whenever the first row of a group starts off being >0. – Arturo Sbr Oct 16 '21 at 00:39
Do you mean, `df['y'] = df.groupby('id', as_index=False).apply(lambda p: (p['x'].gt(0)&p['x'].shift().gt(0)).astype(int)).droplevel(0, axis=0)`? – wwnde Oct 16 '21 at 00:51
I'm trying to think if this would work if a group shows the following behavior: `0, .5, 0, 0`. That's the whole reason I was trying to fetch the first occurrence. Regardless, I think this solves most of my issues. – Arturo Sbr Oct 16 '21 at 00:53
Modify your input, output for clarity. Hower looks to me like your intial question was answered :-) – wwnde Oct 16 '21 at 00:56
What if we tried `df['y'] = df.groupby('id', as_index=False).apply(lambda p: (p['x'].shift().gt(0)).astype(int)).droplevel(0, axis=0)`? – wwnde Oct 16 '21 at 00:59
Have you tried `df['y'] = df.groupby('id', as_index=False).apply(lambda p: (p['x'].shift().gt(0)).cumsum()).droplevel(0, axis=0)` – wwnde Oct 16 '21 at 01:08

score 1 · Answer 3 · answered Oct 16 '21 at 02:05

1

Here is a way using cumprod

df.groupby('id')['x'].transform(lambda x: (~x.eq(0).shift().cumprod().astype(bool)).astype(int))

answered Oct 16 '21 at 02:05

rhug123

7,893
1
9
24

First time group meets condition in pandas DataFrame

3 Answers3