I have the following for function:
def calculateEMAs(df,startIndex,endIndex):
for index,row in df.iterrows():
for i in range (1,51):
if(index-i > 0):
df.loc[index,"EMA%d"%i] = abs(df.iloc[index-i]["Trade Close"] - df.iloc[index]["Trade Close"])/2 #replace this with EMA formula
print(df)
This for loop takes a long time to calculate the values for the data frame as it has to loop 50 times for each row (it takes approximately 62 seconds)
I tried to use multiprocessor pool from this question. My code looks like this now:
def calculateEMAs(df,startIndex,endIndex):
for index,row in df.iterrows():
for i in range (startIndex,endIndex):
if(index-i > 0):
df.loc[index,"EMA%d"%i] = abs(df.iloc[index-i]["Trade Close"] - df.iloc[index]["Trade Close"])/2 #replace this with EMA formula
print(df)
def main():
dfClosePrice= getFileDataframe().to_frame()
pool = Pool()
time0 = time.time()
result1 = pool.apply_async(calculateEMAs,[dfClosePrice,1,10])
result2 = pool.apply_async(calculateEMAs,[dfClosePrice,10,20])
result3 = pool.apply_async(calculateEMAs,[dfClosePrice,20,30])
result4 = pool.apply_async(calculateEMAs,[dfClosePrice,30,40])
result5 = pool.apply_async(calculateEMAs,[dfClosePrice,40,51])
answer1 = result1.get()
answer2 = result2.get()
answer3 = result3.get()
answer4 = result4.get()
answer5 = result5.get()
print(time.time() - time0)
print(dfClosePrice)
I run the function asynchronously with different values for the for loop. this takes 19 seconds to complete and I can see the result of each function printed correctly but the final value of dfClosePirce
is a dataframe with only 1 column (Trade Close) and the new columns from each async function will not be added to the dataframe. How can I do it the right way?