0

I have the following df,

id    year_month    pct
10    201901        10
20    201901        5
30    201901        3
40    201901        2
10    201902        8
20    201902        2
30    201902        7
40    201902        3

I want to sort pct, and then groupby year_month; then do a cumsum on pct which needs to be > 10;

df.sort_values(['pct']).groupby('year_month')['pct'].apply(lambda x: x.cumsum().gt(10))

but it only gave me a series

3    False
5    False
2    False
7    False
1    False
6     True
4     True
0     True
Name: pct, dtype: bool

I am wondering how to get this series back to df as a column,

id    year_month    pct    non-tail
10    201901        10     True
20    201901        5      False 
30    201901        3      False
40    201901        2      False
10    201902        8      True
20    201902        2      True
30    201902        7      False
40    201902        3      False
daiyue
  • 7,196
  • 25
  • 82
  • 149
  • 3
    `df=df.assign(non_tail=df.sort_values(['pct']).groupby('year_month')['pct'].cumsum().gt(10))` , just assign it back and get rid of the `lambda`(not required for cumsum()) – anky Jul 16 '19 at 15:59
  • a simple `df['non_tail'] = df.sort_values('pct')...` would be more memory-efficient? – Quang Hoang Jul 16 '19 at 16:08

0 Answers0