0

I have a dataframe with a column of values (column 'One'). For each pair of values given by (product(sma1, sma2)), I want to:

  • calculate 2 simple moving average using SMA1, and SMA2
  • When condition A or B is met, append to results the value of SMA1, SMA2, and price

The code below shows my approach. Is it possible to do the same thing removing the nested for loop (with apply maybe)? would that be faster / better?

from itertools import product 
import pandas as pd 


data = pd.DataFrame( [3,2,5,8,5,12,7,8,9,10,11,12,13,14,17,19,15,18] ,columns=['One'])
sma1 = range(2, 5)  
sma2 = range(5, 8)
results=pd.DataFrame()

for SMA1, SMA2 in product(sma1, sma2): #for each pair SMA1, SMA2, perfomr the following
    data['SMA1'] = data['One'].rolling(SMA1).mean()
    data['SMA2'] = data['One'].rolling(SMA2).mean()
    
    data.dropna(inplace=True)
    
    for i in range( len(data)):
        #Condition A:
        if data.iloc[i, 1] > data.iloc[i, 2] and data.iloc[i-1, 1] < data.iloc[i-1,2]: 
          #Buy
          price = data.iloc[i, 0]
        #Condition B
        elif data.iloc[i, 1] < data.iloc[i, 2] and data.iloc[i-1, 1] > data.iloc[i-1,2]:
          #Sell
          price = data.iloc[i, 0] 
        #if neither condition is true, dont append anything 
        else: 
          continue
          
        results = results.append(pd.DataFrame(
              {'SMA1': SMA1,
               'SMA2': SMA2,
               'price': price,
               
              },
               index=[0]), ignore_index = True)

    
results

Out:

    SMA1 SMA2   price
0   2    5      5
1   2    5      8
2   2    5      9
3   2    5      18

EIDT from Hugolmn's answer I see the issue with using [i-1]. Howver with your approach I am not getting the right answer, I have simpliefied the probelm to just finding if the conditions A, B are satisfied for 2 columns.

data = pd.DataFrame({'col1': [3,2,6,8,6,11], 'col2': [3,3,5,8,5,12]})

#columns with shifted data
data['col1_shift']=data['col1'].shift(1)
data['col2_shift']=data['col2'].shift(1)

#condition A  
data['Con1_col1>col2'] = data['col1']>data['col2']
data['Con1_col1<col2_shift'] = data['col1_shift'] < data['col2_shift']
data['ConA'] = data['Con1_col1>col2'] & data['Con1_col1<col2_shift']

#condition B
data['Con2_col1<col2'] = data['col1']<data['col2']
data['Con2_col1>col2_shift'] = data['col1_shift'] > data['col2_shift']
data['ConB'] = data['Con2_col1<col2'] & data['Con2_col1>col2_shift']


# data['part1']= ((data.col1 - data.col2) > 0)
data['Hugolmn_method']= ((data.col1 - data.col2) >= 0).diff() > 0
data['expected']= data['ConA'] | data['ConB']

data

Out:

   col1  col2  col1_shift  col2_shift  Con1_col1>col2  Con1_col1<col2_shift  \
0     3     3         NaN         NaN           False                 False   
1     2     3         3.0         3.0           False                 False   
2     6     5         2.0         3.0            True                  True   
3     8     8         6.0         5.0           False                 False   
4     6     5         8.0         8.0            True                 False   
5    11    12         6.0         5.0           False                 False   

    ConA  Con2_col1<col2  Con2_col1>col2_shift   ConB  Hugolmn_method  \
0  False           False                 False  False           False   
1  False            True                 False  False            True   
2   True           False                 False  False            True   
3  False           False                  True  False           False
4  False           False                 False  False           False
5  False            True                  True   True            True   

   expected  
0     False  
1     False  
2      True  
3     False  
4     False  
5      True  

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Leo
  • 1,176
  • 1
  • 13
  • 33
  • Hi, it's always best to avoid handwritting loops in pandas. You can check this method https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column – Hugolmn Jun 21 '20 at 18:08
  • thanks, i'm not clear whether the for with itertool prodcuts is fast or not though ? what woudl be a better approach, cause i cant think of a way to use "where" – Leo Jun 21 '20 at 18:38
  • 1
    Hi! I just saw your edit and modified my answer. Now it works fine! It was just a matter of using `>=` instead of `>`. Check it out ! – Hugolmn Jun 23 '20 at 21:41

1 Answers1

0

I inspected your code and come up with these comments:

  • You don't need a loop to iterate over data
  • You can append a dict to a dataframe directly
  • You're result is wrong for price=5, it was comparing the index 0 with the index -1, so between the first and last row of data. This is probably not an expected behavior

Instead, I would suggest you to use .shift() to do tests with previous rows.

data.iloc[i, 1] > data.iloc[i, 2] and data.iloc[i-1, 1] < data.iloc[i-1,2]

# would become
(data.SMA1 > data.SMA2) & (data.SMA1.shift() < data.SMA2.shift())
Hugolmn
  • 1,530
  • 1
  • 7
  • 20
  • I've updated the code with : data['Hugolmn_method']= ((data.col1 - data.col2) >= 0).diff() > 0, unfurtunately this still doesn't work as it fails on the second row, when the first one is equal: before it gave false, now True. – Leo Jun 24 '20 at 10:56
  • intersting code thoguh, it is a weaker condition than before but it could be useuf! – Leo Jun 24 '20 at 10:59
  • Your calculation for the first line is not relevant : you cannot compare with the previous line. That is why I added the `dropna`. This way, you get the expected result, and a full respect of your conditions :) – Hugolmn Jun 24 '20 at 11:24
  • But for line 2: Val(2,3) and line 1Val(3,3) I expect false. since both conditions cannot be met when the previous row has the same value ( 3,3) . Whilst your answer gives true. – Leo Jun 24 '20 at 13:37
  • This is precisely why I droped the first line : the decision is based on the difference between two rows. Therefore, if the first row is not giving relevant decision, it will bias the next one. You need to use dropna() to avoid such behavior. Could you try again using dropna ? – Hugolmn Jun 24 '20 at 14:20
  • I dont think the issue is that it is at the start of the code. The issue is that your function evaluates a pair of same numbers as True, whilst mine doesn't.Look what happens with these new collumns: 'col1': [3,2,6,8,4,11], 'col2': [3,3,5,8,5,12] -- i change col1[4] to 4 so that it is less than col2[4]. I expect this to evaluate to False, as col1[3]=col2[3]=8. But your method gives True. These numbers are in the middle of the list, so wouldn't have an NA put there from the rolling average. – Leo Jun 24 '20 at 15:09
  • 1
    You're right. Not using dropna fixed the problem for the first line, but not the next ones. I will try to fix this :) – Hugolmn Jun 24 '20 at 15:24
  • 1
    I retracted my solution. The best I can come up with is using `(data.SMA1 > data.SMA2) & (data.SMA1.shift() < data.SMA2.shift())` – Hugolmn Jun 24 '20 at 16:17