Make an LSTM prediction written in NumPy faster

Question

I have written a bidirectional-LSTM prediction function using NumPy (and not Tensorflow nor PyTorch), and I need to make it faster. The network has three layers, but for the sake of simplicity, I will just show (and time) the first layer. This bi-LSTM layer is called by calling the subfunctions LSTMf() and LSTMb() to process the input data (array of 500 points) forward and backwards. The LSTMf() and LSTMb() have loops which I suspect take the most time. Here is the prediction function:

import numpy as np


def predict(xt, ht, c, u, t, whff, wxff, bff, whif, wxif, bif, whlf, wxlf, blf, whof, wxof, bof, whfb,
             wxfb, bfb, whib, wxib, bib, whlb, wxlb, blb, whob, wxob, bob):

    def tanh(a):
        return np.tanh(a)

    def sig(a):
        return 1 / (1 + np.exp(-a))

    def cell(x, h, c, wh1, wx1, b1, wh2, wx2, b2, wh3, wx3, b3, wh4, wx4, b4):
        new_c = c * sig(h @ wh1 + x @ wx1 + b1) + sig(h @ wh2 + x @ wx2 + b2) * tanh(h @ wh3 + x @ wx3 + b3)
        new_h = tanh(new_c) * sig(h @ wh4 + x @ wx4 + b4)
        return new_c, new_h

    def LSTMf(xt, ht, c, t, whf, wxf, bf, whi, wxi, bi, whl, wxl, bl, who, wxo, bo):
        h = ht[t - 1:t]
        for i in range(t):
            c, h = cell(xt[i:i + 1], h, c, whf, wxf, bf, whi, wxi, bi, whl, wxl, bl, who, wxo, bo)
            ht[i] = h
        return ht

    def LSTMb(xt, ht, c, t, whf, wxf, bf, whi, wxi, bi, whl, wxl, bl, who, wxo, bo):
        h = ht[0:1]
        for i in range(t - 1, -1, -1):
            c, h = cell(xt[i:i + 1], h, c, whf, wxf, bf, whi, wxi, bi, whl, wxl, bl, who, wxo, bo)
            ht[i] = h
        return ht

    # LSTM-bi 1
    hf = LSTMf(xt, ht.copy(), c, t, whff, wxff, bff, whif, wxif, bif, whlf, wxlf, blf, whof, wxof, bof)
    hb = LSTMb(xt, ht.copy(), c, t, whfb, wxfb, bfb, whib, wxib, bib, whlb, wxlb, blb, whob, wxob, bob)
    xt = np.concatenate((hf, hb), axis=1)
    return xt

The input data and the rest of parameters can be artificially generated with the following code:

t = 500  # input's number of points
u = 64   # layer's number of units
xt = np.zeros((t, 1), dtype=np.float32)  # input
ht = np.zeros((t, u), dtype=np.float32)
ou = np.zeros((1, u), dtype=np.float32)
uu = np.zeros((u, u), dtype=np.float32)
weights = {'wxif':ou,'wxff':ou,'wxlf':ou,'wxof':ou,'whif':uu,'whff':uu,'whlf':uu,'whof':uu,'bif':ou,'bff':ou,'blf':ou,'bof':ou,
           'wxib':ou,'wxfb':ou,'wxlb':ou,'wxob':ou,'whib':uu,'whfb':uu,'whlb':uu,'whob':uu,'bib':ou,'bfb':ou,'blb':ou,'bob':ou}

yt = predict(xt, ht, ou, **weights)  # Call example

I have timed it (1) like this, (2) with Numba, and (3) with Cython:

import numpy as np
from predict import predict
from predict_numba import predict_numba
from predict_cython import predict_cython
import timeit


n = 100
print(timeit.Timer(lambda: predict(xt, ht, ou, u, t, **weights)).timeit(n)/n)        # 0.05198 s
predict_numba(xt, ht, ou, u, t, **weights)  # Dummy slow numba call
print(timeit.Timer(lambda: predict_numba(xt, ht, ou, u, t, **weights)).timeit(n)/n)  # 0.01149 s
print(timeit.Timer(lambda: predict_cython(xt, ht, ou, u, t, **weights)).timeit(n)/n) # 0.13345 s

I would like to make this prediction faster than 0.03 s.
Numba is fast enough but I cannot have a very slow first call (more than 30 s for the three layers)
Cython is very slow; I'm not sure if this is the reason, but following the advice here (Cython: matrix multiplication) I did not type most parameters since the operation '@' does not support memory views.

Originally I was using Keras with CPU or GPU, but NumPy is faster than either. I have also heard of TorchScript which might be applicable. What can I do to make the prediction faster?

__

Context: This function predicts the R-peaks in an ECG window, and is meant to be called as frequently as possible, to predict the R-peaks of an ECG being acquired in real-time.

PS. In case you want to make sense of the calculations, this description of how an LSTM cell works might be of use: https://i.stack.imgur.com/jgNd2.jpg

You can speed-up numba compilation by providing [function signatures](https://numba.readthedocs.io/en/stable/user/jit.html#eager-compilation) — dankal444, Sep 30 '22 at 09:49

Make an LSTM prediction written in NumPy faster

0 Answers0

Linked