New column based on threshold of next rows in Python Pandas

Question

I have a pandas dataframe column that looks as follows:

I'm trying to create a new column that is determined based on the current row value but taking into account the next few row values also. So e.g. if (row_col1(i) > 60) & ((row_col1(i+1)+row_col1(i+2)+row_col1(i+3)) > 100), write 'yes' in col2.

col1   col2
200    yes
300    yes
400    yes
200    yes
500    yes
700    yes
0      no
0      no
60     no
0      no
0      no

Any ideas on how to achieve this?

score 1 · Accepted Answer · answered Jul 03 '20 at 11:25

1

You can handle this in a DataFrame using the .shift() method. Refer to this link for an example.

Here's one approach which uses zip and list:

l1 = df['col1'].tolist()
l2 = []
for a,b,c,d in zip(l1,l1[1:],l1[2:],l1[3:]):
   if a>60 & (b+c+d)>100:
      l2 += ['yes']
   else:
      l2 += ['no']
l2 += ['','',''] # cater for the last 3 entries. I've left it blank, you can decide whether to go for 'yes' or 'no' or 'NA'
df['col2']= pd.Series(l2)

answered Jul 03 '20 at 11:25

Ji Wei

840
9
19

1

That's been really helpful for my analysis! It's kind of general so I can easily apply it to other scenarios. Thank you. – N.Foe Jul 10 '20 at 11:57

New column based on threshold of next rows in Python Pandas

1 Answers1