I was conducting tests on Python (3.8.6) work speed. The test case was matrix per vector multiplication with sizes 10.000x10.000 and 10.000 correspondingly. The matrix and the vector were randomly filled with float numbers.
Firstly, I tried this code:
import time
import numpy as np
def str_per_vec(a, b, n):
res = 0
for i in range(n):
res += a[i] * b[i]
return res
N = 10000
A = np.random.randn(N, N)
b = np.random.randn(N)
correct_answer = A @ b
A = A.tolist()
b = b.tolist()
c = [None] * N
start = time.perf_counter()
for i in range(N):
c[i] = str_per_vec(A[i], b, N)
end = time.perf_counter()
assert np.allclose(c, correct_answer)
print("Time:", end - start)
And the output was "Time: 6.585052800000001"
Then I tried another code. In fact, I just removed the function and wrote it in the loop itself:
import time
import numpy as np
N = 10000
A = np.random.randn(N, N)
b = np.random.randn(N)
correct_answer = A @ b
A = A.tolist()
b = b.tolist()
c = [None] * N
start = time.perf_counter()
for i in range(N):
buf = 0
a = A[i]
for j in range(N):
buf += a[j] * b[j]
c[i] = buf
end = time.perf_counter()
assert np.allclose(c, correct_answer)
print("Time:", end - start)
And this time the output was "Time: 12.4580008".
So, I just moved code from the function to the loop, but it took twice more time to execute! I'm really confused with it, because I have no idea why this happened. If someone proficient in Python could help me, I would be very grateful!
P.S. I conducted tests several times and the results were stable.