0

Is there a way to optimize the below code snipped? I am trying to calculate the value of the current row column using the previous row column value and a period specified in the custom function and a price in the current row column.

import pandas as pd

class EMA_Period:
   fast = 8
   slow = 17

def calculate_ema(prev_ema, price, period):
    return prev_ema + (2.0 / (1.0 + period)) * (price - prev_ema)

times = [1578614400, 1578614700, 1578615000, 1578615300, 1578615600]
closes = [10278.6, 10276.0, 10275.6, 10274.8, 10277.0]
fast_ema = [10278.6, 0, 0, 0, 0]

df = pd.DataFrame(data={'time': times, 'close': closes, 'fast_ema': fast_ema})

df.set_index('time', inplace=True)

for i in range(1, df.shape[0]):
    df.iloc[i]['fast_ema'] = calculate_ema(df.iloc[i-1]['fast_ema'], df.iloc[i]['close'], EMA_Period.fast)
  • See ["Is there a way in Pandas to use previous row value in dataframe.apply when previous value is also calculated in the apply?"](https://stackoverflow.com/questions/34855859/is-there-a-way-in-pandas-to-use-previous-row-value-in-dataframe-apply-when-previ) – Mars Buttfield-Addison Jul 21 '21 at 13:14
  • @MarsButtfield-Addison Thanks, the last answer worked for me and is way faster – Masilive Sifanele Jul 21 '21 at 17:38

2 Answers2

0

Thanks @Mars

def calc_ema(df, period=8, col_name='fast'):
    prev_value = df.iloc[0][col_name]
    def func2(row):
        # non local variable ==> will use pre_value from the new_fun function
        nonlocal prev_value
        prev_value = prev_value + (2.0 / (1.0 + period)) * (row['close'] - prev_value)
        return prev_value
    # This line might throw a SettingWithCopyWarning warning
    df.iloc[1:][col_name] = df.iloc[1:].apply(func2, axis=1)
    return df
df = calc_ema(df)
0

You should really use a vectorized approach if you care about speed. Looping over the rows will always be the slowest option (though sometimes unavoidable)

You don't even need to change your function to make it vectorized!

def calculate_ema(prev_ema, price, period):
    return prev_ema + (2.0 / (1.0 + period)) * (price - prev_ema)

# though we will make your dataframe longer: 500 rows instead of 5 rows
df = pd.concat([df] * 100)

print(df)
              close  fast_ema
time                         
1578614400  10278.6   10278.6
1578614700  10276.0       0.0
1578615000  10275.6       0.0
1578615300  10274.8       0.0
1578615600  10277.0       0.0
...             ...       ...
1578614400  10278.6   10278.6
1578614700  10276.0       0.0
1578615000  10275.6       0.0
1578615300  10274.8       0.0
1578615600  10277.0       0.0

[500 rows x 2 columns]

Note that these tests are timing 2 important things:

  • Performance of the calculation itself
  • Performance of assigning values back into a dataframe

Row looping solution

%%timeit
for i in range(1, df.shape[0]):
    df.iloc[i]['fast_ema'] = calculate_ema(df.iloc[i-1]['fast_ema'], df.iloc[i]['close'], EMA_Period.fast)

10 loops, best of 5: 86.1 ms per loop

86.1 ms is pretty slow for such a small dataset. Let's see how the vectorized approach compares:


Vectorized Solution

  • By using .shift() on the "fast_ema" column we can change how these vectors align such that each value in "close" is aligned with the previous "fast_ema".
  • With the alignment taken care of, we can feed these vectors directly into the calculate_ema function without making any changes
%%timeit 
df["fast_ema"].iloc[1:] = calculate_ema(df["fast_ema"].shift(), df["close"], EMA_Period.fast).iloc[1:]

1000 loops, best of 5: 569 µs per loop

Time comparisons:

Approach Time
Row Looping 86.1 ms
Vectorized 569 µs
Cameron Riddell
  • 10,942
  • 9
  • 19