I was trying to implement 101 quant trading factors that were published by WorldQuant (https://arxiv.org/pdf/1601.00991.pdf).
A typical factor is about processing stocks' price and volume information along with both time dimension and stock dimension. Take the example of alpha factor #4: (-1 * Ts_Rank(rank(low), 9)). This is a momentum alpha signal. low is a panel of stocks' low price within a certain time period. rank is a cross-sectional process of ranking panel’s each row (a time snapshot). Ts_Rank is a time-series process of moving_rank panel’s each column (a stock) with a specified window.
Intuitively, the Pandas dataframe or NumPy matrix should fit the implementation of 101 alpha factors. Below is the best implementation using NumPy I got so far. However, the performance was too low. On my Intel core i7 windows machine, it took around 45 seconds to run the alpha #4 factor with a 5000 (trade dates) by 200 (stocks) matrix as input.
I also came across DolphinDB, a time series database with built-in analytics features (https://www.dolphindb.com/downloads.html). For the same factor Alpha#4, DolphinDB ran for mere 0.04 seconds, 1000 times faster than the NumPy version. However, DolphinDB is commercial software. Does anybody know better python implementations? Or any tips to improve my current python code to achieve performance comparable to DolphinDB?
Numpy implementation (based on https://github.com/yli188/WorldQuant_alpha101_code)
import numpy as np
def rankdata(a, method='average', *, axis=None):
# this rankdata refer to scipy.stats.rankdata (https://github.com/scipy/scipy/blob/v1.9.1/scipy/stats/_stats_py.py#L9047-L9153)
if method not in ('average', 'min', 'max', 'dense', 'ordinal'):
raise ValueError('unknown method "{0}"'.format(method))
if axis is not None:
a = np.asarray(a)
if a.size == 0:
np.core.multiarray.normalize_axis_index(axis, a.ndim)
dt = np.float64 if method == 'average' else np.int_
return np.empty(a.shape, dtype=dt)
return np.apply_along_axis(rankdata, axis, a, method)
arr = np.ravel(np.asarray(a))
algo = 'mergesort' if method == 'ordinal' else 'quicksort'
sorter = np.argsort(arr, kind=algo)
inv = np.empty(sorter.size, dtype=np.intp)
inv[sorter] = np.arange(sorter.size, dtype=np.intp)
if method == 'ordinal':
return inv + 1
arr = arr[sorter]
obs = np.r_[True, arr[1:] != arr[:-1]]
dense = obs.cumsum()[inv]
if method == 'dense':
return dense
# cumulative counts of each unique value
count = np.r_[np.nonzero(obs)[0], len(obs)]
if method == 'max':
return count[dense]
if method == 'min':
return count[dense - 1] + 1
# average method
return .5 * (count[dense] + count[dense - 1] + 1)
def rank(x):
return rankdata(x,method='min',axis=1)/np.size(x, 1)
def rolling_rank(na):
return rankdata(na.transpose(),method='min',axis=0)[-1].transpose()
def ts_rank(x, window=10):
a_rolled = np.lib.stride_tricks.sliding_window_view(x, window,axis = 0)
return np.append(np.full([window-1,np.size(x, 1)],np.nan),rolling_rank(a_rolled),axis = 0)
def alpha004(data):
return -1 * ts_rank(rank(data), 9)
import time
# The input is a 5000 by 200 matrix, where the row index represents trade date and the column index represents security ID.
data=np.random.random((5000, 200))
start_time = time.time()
alpha004(data)
print("--- %s seconds ---" % (time.time() - start_time))
--- 44.85099506378174 seconds ---
DolphinDB implementation
def WQAlpha4(low){
return -mrank(rowRank(low, percent=true), true, 9)
}
// The input is a 5000 by 200 matrix, where the row index represents trade date and the column index represents security ID.
low = rand(1000.0,5000:200);
timer WQAlpha4(low);
Time elapsed: 44.036 ms (0.044s)