I am creating a back testing program in Python. At the moment i need a really consistent speed up. With Cython I achieved a 200x speed up but it's not enough. If I ran my code on all of my data it would still take around 16 hours and I would probably need to run it multiple times.
I have used cProfile on my code and find out that this function takes around 98%-99% of all the run time.
import numpy as np
cimport cython
cimport numpy as np
np.import_array()
@cython.wraparound(False)
@cython.boundscheck(False)
@cython.cdivision(True)
cdef tp_sl_back_test(np.ndarray[np.float64_t, ndim=2] data, double tp, double sl):
cdef double balance = 100
cdef double balance_copy
cdef Py_ssize_t i
cdef int right = 0
cdef int total = 0
cdef double entry_price
cdef double close_price
cdef double high_price
cdef double low_price
cdef double tp_price
cdef double sl_price
for i in xrange(data.shape[0]):
balance_copy = balance
entry_price = data[i, 0]
high_price = data[i, 1]
low_price = data[i, 2]
close_price = data[i, 3]
tp_price = entry_price + ((entry_price/100) * tp)
sl_price = entry_price - ((entry_price/100) * sl)
if (sl_price < low_price) and (tp_price > high_price):
pass
elif (sl_price >= low_price) and (tp_price > high_price):
close_price = sl_price
elif (sl_price < low_price) and (tp_price <= high_price):
close_price = tp_price
else:
close_price = sl_price
balance *= 0.9996
balance += ((close_price - entry_price) * (balance / entry_price))
balance *= 0.9996
if balance_copy < balance:
right += 1
total += 1
else:
total += 1
return balance, right, total
I am new to Cython and don't know many optimisation techniques. Maybe my code can not be optimised more than that.
I have tried changing np.ndarray[np.float64_t, ndim=2] data
to double[:, :]
but I got almost no effect.
I need atleast a 800x speed in order to achieve a satisfying result.
Any critic is welcome.
Thanks to everyone in advance.
EDIT:
I have done some changes and the code now looks like this:
@cython.wraparound(False)
@cython.boundscheck(False)
@cython.cdivision(True)
cdef tp_sl_back_test(double[:, :] data, double tp, double sl):
cdef double balance = 100
cdef double balance_copy
cdef Py_ssize_t i
cdef int right = 0
cdef int total = 0
cdef double entry_price
cdef double close_price
cdef double high_price
cdef double low_price
cdef double tp_price
cdef double sl_price
for i in xrange(data.shape[0]):
balance_copy = balance
entry_price = data[i, 0]
high_price = data[i, 1]
low_price = data[i, 2]
close_price = data[i, 3]
tp_price = entry_price + ((entry_price * 0.01) * tp)
sl_price = entry_price - ((entry_price * 0.01) * sl)
if (sl_price < low_price) and (tp_price > high_price):
pass
elif sl_price >= low_price:
close_price = sl_price
elif tp_price <= high_price:
close_price = tp_price
else:
close_price = sl_price
balance *= 0.9996
balance += ((close_price - entry_price) * (balance / entry_price))
balance *= 0.9996
if balance_copy < balance:
right += 1
total += 1
else:
total += 1
return balance, right, total
I have tried to add some optimisation flags but i get a D9002 error with the message that -O3 is an unknown flag at the moment i'm trying to fix it.
Still haven't tried Numba but will soon.
Kelly Bundy asked for some data information, and here it is:
- The data i'm performing testing is 2892 rows long. But this is the data of 1 asset on 1 timeframe(4 hour time frame) + I wuold need to test this with diffente parameters
- The function above is called 20.000 times insed of another cython function.
- The open_time, volume and color columns are not in the data array
EDIT 2:
This is the full .pyx file: https://pastebin.com/UwEUH5EK
tp_sl_mash is just an array of tp(take profit) and sl(stop loss) combinations it's done by finding the biggest price differencies
tp_array = np.linspace(0, (high_max + 2 * (high_max / lenght_array)), lenght_array)
sl_array = np.linspace(0, (low_max + 2 * (low_max / lenght_array)), lenght_array)
tp_sl_mash = np.array(np.meshgrid(tp_array, sl_array)).T.reshape(-1, 2)
high_max, low_max = find_min_max(data)
def find_min_max(df):
df['Percent_High'] = df['High_Price'] / df['Open_Price']
df['Percent_Low'] = df['Low_Price'] / df['Open_Price']
df_high_max = (df['Percent_High'].max() - 1) * 100
df_low_max = abs((1 - df['Percent_Low'].min())) * 100
return df_high_max, df_low_max
You can install the test data from this link https://drive.google.com/file/d/1-kuFJjRRDrEnclgIdoDvMt-dOVlPqPwi/view?usp=sharing
All the main testing file: https://pastebin.com/HVyU8xJy