0

I have a pandas dataframe column that looks as follows:

col1
200    
300    
400    
200    
500    
700    
0      
0      
60     
0      
0      

I'm trying to create a new column that is determined based on the current row value but taking into account the next few row values also. So e.g. if (row_col1(i) > 60) & ((row_col1(i+1)+row_col1(i+2)+row_col1(i+3)) > 100), write 'yes' in col2.

col1   col2
200    yes
300    yes
400    yes
200    yes
500    yes
700    yes
0      no
0      no
60     no
0      no
0      no

Any ideas on how to achieve this?

N.Foe
  • 79
  • 6

1 Answers1

1

You can handle this in a DataFrame using the .shift() method. Refer to this link for an example.

Here's one approach which uses zip and list:

l1 = df['col1'].tolist()
l2 = []
for a,b,c,d in zip(l1,l1[1:],l1[2:],l1[3:]):
   if a>60 & (b+c+d)>100:
      l2 += ['yes']
   else:
      l2 += ['no']
l2 += ['','',''] # cater for the last 3 entries. I've left it blank, you can decide whether to go for 'yes' or 'no' or 'NA'
df['col2']= pd.Series(l2)
Ji Wei
  • 840
  • 9
  • 19
  • 1
    That's been really helpful for my analysis! It's kind of general so I can easily apply it to other scenarios. Thank you. – N.Foe Jul 10 '20 at 11:57