I am calculating the RSI value for a stock price where a previous row is needed for the result of the current row. I am doing it currently via for looping the full dataframe for as many times there are entries which takes a lot of time (executing time on my pc around 15 seconds).
Is there any way to improve that code?
import pandas as pd
from pathlib import Path
filename = Path("Tesla.csv")
test = pd.read_csv(filename)
data = pd.DataFrame(test[["Date","Close"]])
data["Change"] = (data["Close"].shift(-1)-data["Close"]).shift(1)
data["Gain"] = 0.0
data["Loss"] = 0.0
data.loc[data["Change"] >= 0, "Gain"] = data["Change"]
data.loc[data["Change"] <= 0, "Loss"] = data["Change"]*-1
data.loc[:, "avgGain"] = 0.0
data.loc[:, "avgLoss"] = 0.0
data["avgGain"].iat[14] = data["Gain"][1:15].mean()
data["avgLoss"].iat[14] = data["Loss"][1:15].mean()
for index in data.iterrows():
data.loc[15:, "avgGain"] = (data.loc[14:, "avgGain"].shift(1)*13 + data.loc[15:, "Gain"])/14
data.loc[15:, "avgLoss"] = (data.loc[14:, "avgLoss"].shift(1)*13 + data.loc[15:, "Loss"])/14
The used dataset can be downloaded here: TSLA historic dataset from yahoo finance
The goal is to calculate the RSI value based on the to be calculated avgGain and avgLoss value. The avgGain value on rows 0:14 are not existent. The avgGain value on row 15 is the mean value of row[1:14] of the Gain column. The avgGain value from row 16 onwards is calculated as: (13*avgGain(row before)+Gain(current row))/14