You should really use a vectorized approach if you care about speed. Looping over the rows will always be the slowest option (though sometimes unavoidable)
You don't even need to change your function to make it vectorized!
def calculate_ema(prev_ema, price, period):
return prev_ema + (2.0 / (1.0 + period)) * (price - prev_ema)
# though we will make your dataframe longer: 500 rows instead of 5 rows
df = pd.concat([df] * 100)
print(df)
close fast_ema
time
1578614400 10278.6 10278.6
1578614700 10276.0 0.0
1578615000 10275.6 0.0
1578615300 10274.8 0.0
1578615600 10277.0 0.0
... ... ...
1578614400 10278.6 10278.6
1578614700 10276.0 0.0
1578615000 10275.6 0.0
1578615300 10274.8 0.0
1578615600 10277.0 0.0
[500 rows x 2 columns]
Note that these tests are timing 2 important things:
- Performance of the calculation itself
- Performance of assigning values back into a dataframe
Row looping solution
%%timeit
for i in range(1, df.shape[0]):
df.iloc[i]['fast_ema'] = calculate_ema(df.iloc[i-1]['fast_ema'], df.iloc[i]['close'], EMA_Period.fast)
10 loops, best of 5: 86.1 ms per loop
86.1 ms is pretty slow for such a small dataset. Let's see how the vectorized approach compares:
Vectorized Solution
- By using
.shift()
on the "fast_ema" column we can change how these vectors align such that each value in "close" is aligned with the previous "fast_ema".
- With the alignment taken care of, we can feed these vectors directly into the
calculate_ema
function without making any changes
%%timeit
df["fast_ema"].iloc[1:] = calculate_ema(df["fast_ema"].shift(), df["close"], EMA_Period.fast).iloc[1:]
1000 loops, best of 5: 569 µs per loop
Time comparisons:
Approach |
Time |
Row Looping |
86.1 ms |
Vectorized |
569 µs |