1

I am working on pandas and I have four column

Name    Sensex_index    Start_Date       End_Date
AAA        0.5           20/08/2016    25/09/2016 
AAA        0.8           26/08/2016    29/08/2016 
AAA        0.4           30/08/2016    31/08/2016
AAA        0.9           01/09/2016    05/09/2016
AAA        0.5           12/09/2016    22/09/2016
AAA        0.3           24/09/2016    29/09/2016
ABC        0.9           01/01/2017    15/01/2017
ABC        0.5           23/01/2017    30/01/2017
ABC        0.7           02/02/2017    15/03/2017

If the sensex index of same name increases from lower index and moves to higher index, then the Termination date is the previous value, for example, I am looking for the following output,

Name   Sensex_index  Actual_Start      Termination_Date 
AAA        0.5        20/08/2016          31/08/2016
AAA        0.8        20/08/2016          31/08/2016
AAA        0.4        20/08/2016          31/08/2016 [high to low; low to high,terminate]
AAA        0.9        01/09/2016          29/09/2016
AAA        0.5        01/09/2016          29/09/2016      
AAA        0.3        01/09/2016          29/09/2016 [end of AAA]
ABC        0.9        01/01/2017          30/01/2017  
ABC        0.5        01/01/2017          30/01/2017 [high to low; low to high,terminate]
ABC        0.7        02/02/2017          15/03/2017 [end of ABC]
  • Like I said last time you asked: http://stackoverflow.com/questions/43896261/how-to-use-pandas-to-add-new-column-using-if-statement#comment74826820_43896261 – Paul H May 10 '17 at 21:17

1 Answers1

0
#Setup
df = pd.DataFrame(data = [['AAA', 0.5, '20/08/2016', '25/09/2016'],
 ['AAA', 0.8, '26/08/2016', '29/08/2016'],
 ['AAA', 0.4, '30/08/2016', '31/08/2016'],
 ['AAA', 0.9, '01/09/2016', '05/09/2016'],
 ['AAA', 0.5, '12/09/2016', '22/09/2016'],
 ['AAA', 0.3, '24/09/2016', '29/09/2016'],
 ['ABC', 0.9, '01/01/2017', '15/01/2017'],
 ['ABC', 0.5, '23/01/2017', '30/01/2017'],
 ['ABC', 0.7, '02/02/2017', '15/03/2017']], columns = ['Name', 'Sensex_index', 'Start_Date', 'End_Date'])

#Find the rows where price change from high to low and then to high
df['change'] = df.groupby('Name')['Sensex_index'].apply(lambda x: x.rolling(3,center=True).apply(lambda y: True if (y[1]<y[0] and y[1]<y[2]) else False))
#Find the last row for each name
df.iloc[df.groupby('Name')['change'].tail(1).index, -1] = 1.0        
#Set End_Date as Termination_Date for those changing points
df['Termination_Date'] = df.apply(lambda x: x.End_Date if x.change>0 else np.nan, axis=1)
#Set Actual_Start
df['Actual_Start'] = df.apply(lambda x: x.Start_Date if (x.name==0 
                                                          or x.Name!= df.iloc[x.name-1]['Name'] 
                                                          or df.iloc[x.name-1]['change']>0) 
                                                     else np.nan, axis=1)
#back fill the Termination_Date for other rows.
df.Termination_Date.fillna(method='bfill', inplace=True)
#forward fill the Actual_Start for other rows.
df.Actual_Start.fillna(method='ffill', inplace=True)
print(df)
Allen Qin
  • 19,507
  • 8
  • 51
  • 67
  • No worries. Just updated the answer. Please let me know if it works. – Allen Qin May 10 '17 at 23:46
  • or x.name!= df.iloc[x.name-1]['name'] TypeError: ("unsupported operand type(s) for -: 'str' and 'int'", u'occurred at index 0') –  May 11 '17 at 09:33
  • x.name!= df.iloc[x.name-1]['name'] should be x.Name!= df.iloc[x.name-1]['Name']. Note it's capital Name. Please copy the code exactly the way it is and try again. – Allen Qin May 11 '17 at 09:40
  • Does it work now?you can execute the code line by line and track the change. Which line do u need help with? – Allen Qin May 12 '17 at 17:23
  • yes i tried! but it doesnt work :( i also have sensex index value as 0.6041229234567236; 0.6467984563211275. is that a problem ? –  May 14 '17 at 07:20
  • Hey! i find this problem very interesting ! could you add how it works along the answer ? –  May 14 '17 at 07:21
  • @Miss.X, can you be more specific? Does the code work with the example data? Do you only get the error with your real data? What error did you get? – Allen Qin May 14 '17 at 07:28
  • Hey mate ! I did not get any error ! but when i checked the output file, the actual start and termination date werent not as expected. they were just same as the start date and end date –  May 14 '17 at 07:30
  • I've added the test data I used in the answer. Can you copy all lines of my answer and execute to see if the output is correct? If it's correct, you need to check if you real data has the same structure as the test data. – Allen Qin May 14 '17 at 07:35
  • if my real data has sensex index as 0.6041229234567236; 0.6467984563211275;.0.6234567891321127 . do i have to change anything ?? –  May 14 '17 at 07:41
  • Good to know it works now. It should be OK for sensex index to have values like that. – Allen Qin May 14 '17 at 07:42
  • @Allen hello mate! i realized why it didnt work with the real data.! it was because my ''start date'' was not sorted in increasing order! where can i add the code line to sort my ''start date'' ?? –  May 14 '17 at 20:30
  • You can just sort it as the first step using 'df.sort_values(by=['Name','Start_Date'], inplace=True)'. But please make sure the dates columns are actually of type DateTime otherwise it will give you incorrect sorting results. – Allen Qin May 14 '17 at 21:31
  • Hello! I tried something similar, i get this error: IndexError: positional indexers are out-of-bounds –  Jul 07 '17 at 09:31