I have a code that runs really quickly in Matlab (within few seconds). I had to port the code to python for making it more open-source friendly. This code has the dreaded slow 'for' loop which is nested with cascade update. Takes about 45 mins to run the same code. However, I am unable to speed up these loops by rearranging the calculations or even trying to introduce multiprocessing as I only have Ryzen 5 laptop. I am posting here the code snippet where the code spends maximum time. Can anyone please suggest an optimization technique to speed this up.
Python code:
from scipy import signal
from matplotlib import *
from numpy import zeros, ones, multiply, array, arange
import time
fs = 16000
npoints = 16000
Ntem = 30
# create a log-sine-sweep
f0 = 10 # sweep start frequency
f1 = fs/2 # sweep end frequency
t1 = arange(npoints)/fs # sample times
stimulus = signal.chirp(t1, f0, t1[-1], f1, method='logarithmic', phi=-90)
W0 = zeros((Ntem, npoints))
W1 = zeros((Ntem, npoints))
BM = zeros((Ntem, npoints))
BM_d = zeros((Ntem, npoints))
a0rf = array([-0.721112,-0.532501,-0.335426,-0.143986,0.0333392,0.192211,0.331123,
0.45037,0.551292,0.635753,0.705805,0.763481,0.810679,0.849101,0.880237,0.905366,
0.925569,0.941754,0.954673,0.964947,0.973084,0.979502,0.984539,0.988471,0.99152,
0.993866,0.995654,0.997002,0.998002,0.998729])
c0rf = array([0.692818,0.84643,0.942067,0.98958,0.999444,0.981354,0.943588,0.892842,
0.834312,0.771893,0.708406,0.64583,0.585491,0.528231,0.474535,0.424633,0.378578,
0.336301,0.297656,0.262447,0.230451,0.201436,0.175166,0.151413,0.129957,0.110592,
0.0931255,0.0773789,0.0631885,0.0504047])
gf = array([0.879889,0.813389,0.75372,0.703499,0.662554,0.629685,0.603486,0.582661,
0.566118,0.552971,0.542517,0.534197,0.527574,0.522299,0.518099,0.514757,0.512099,
0.509988,0.508315,0.50699,0.505945,0.505123,0.504478,0.503976,0.503586,0.503286,
0.503056,0.502883,0.502754,0.50266])
ghf = array([0.692818,0.84643,0.942067,0.98958,0.999444,0.981354,0.943588,0.892842,
0.834312,0.771893,0.708406,0.64583,0.585491,0.528231,0.474535,0.424633,0.378578,
0.336301,0.297656,0.262447,0.230451,0.201436,0.175166,0.151413,0.129957,0.110592,
0.0931255,0.0773789,0.0631885,0.0504047])
start_time = time.time()
for t in range(npoints-1):
for s in range(Ntem):
if (s == 0):
stim = stimulus[t]
else:
stim = BM[s-1, t]
W0[s, t] = stim + a0rf[s]*W0[s, t-1] - c0rf[s]*W1[s, t-1]
W1[s, t] = a0rf[s]*W1[s, t-1] + c0rf[s]*W0[s, t-1]
BM[s, t] = gf[s]*stim + ghf[s]*W1[s, t]
BM_d[s, t] = BM[s, t] - BM[s, t-1]
elapsed_time = time.time() - start_time
print('time cost = ',elapsed_time)
This code takes 2.8142993450164795 secs
Matlab code:
fs = 16000.0 ; % sample frequency
npoints = 16000 ; % stimulus length
Ntem = 30;
f0 = 10 ; % sweep start frequency
f1 = fs/2 ; % sweep end frequency
t1 = 1:npoints ;
t1= t1./fs ; % sample times
gain = 0.1;
stimulus = gain*chirp(t1, f0, t1(npoints), f1, 'logarithmic', -90);
stimulus = (stimulus);
W0 = zeros(Ntem,npoints);
W1 = zeros(Ntem,npoints);
BM = zeros(Ntem,npoints);
BM_d = zeros(Ntem,npoints);
stim=0;
a0rf = [-0.721112 -0.532501 -0.335426 -0.143986 0.0333392 0.192211 0.331123 ...
0.45037 0.551292 0.635753 0.705805 0.763481 0.810679 0.849101 0.880237 ...
0.905366 0.925569 0.941754 0.954673 0.964947 0.973084 0.979502 0.984539 ...
0.988471 0.99152 0.993866 0.995654 0.997002 0.998002 0.998729];
c0rf = [0.692818 0.84643 0.942067 0.98958 0.999444 0.981354 0.943588 0.892842 ...
0.834312 0.771893 0.708406 0.64583 0.585491 0.528231 0.474535 0.424633 0.378578 ...
0.336301 0.297656 0.262447 0.230451 0.201436 0.175166 0.151413 0.129957 0.110592 ...
0.0931255 0.0773789 0.0631885 0.0504047];
gf = [0.879889 0.813389 0.75372 0.703499 0.662554 0.629685 0.603486 0.582661 ...
0.566118 0.552971 0.542517 0.534197 0.527574 0.522299 0.518099 0.514757 0.512099 ...
0.509988 0.508315 0.50699 0.505945 0.505123 0.504478 0.503976 0.503586 0.503286 ...
0.503056 0.502883 0.502754 0.50266];
ghf = [0.692818 0.84643 0.942067 0.98958 0.999444 0.981354 0.943588 0.892842 ...
0.834312 0.771893 0.708406 0.64583 0.585491 0.528231 0.474535 0.424633 0.378578 ...
0.336301 0.297656 0.262447 0.230451 0.201436 0.175166 0.151413 0.129957 0.110592 ...
0.0931255 0.0773789 0.0631885 0.0504047];
tic
for t=2:(npoints)
for s=1:(Ntem)
if (s==1)
stim = stimulus(t);
else
stim = BM(s-1,t);
end
W0(s,t) = stim + a0rf(s)*W0(s,t-1) - c0rf(s)*W1(s,t-1);
W1(s,t) = a0rf(s)*W1(s,t-1) + c0rf(s)*W0(s,t-1);
BM(s,t) = gf(s)*stim + ghf(s)*W1(s,t);
BM_d(s,t) = BM(s,t) - BM(s,t-1);
end
end
toc
This code takes 0.091413 secs
So, the stimulus here I have a fixed value. In my original code, I have an array of 160 stimulus inputs, which would increase the time further for these codes. That just compounds the delay in python code further.