Why np.dot's perfomance in loop is better than without loop

Question

from datetime import datetime as time

a = np.random.randn(10000, 64, 8)
b = np.random.randn(8, 100)

t1 = time.now()
res1 = np.dot(a, b)
t2 = time.now()

res2 = np.zeros((10000, 64, 100))
for i in range(len(a)):
    res2[i] = a[i].dot(b)
t3 = time.now()

print('With loop: {}\nWithout: {}'.format((t3 - t2).total_seconds()*1000, (t2 - t1).total_seconds()*1000))

>>>With loop: 562.9920000000001
>>>Without: 2124.908

I chose the dimensions randomly, but they show the difference. Why is it so huge? Also as I increase dimensionality of arrays it only grows. With small dimensions np.dot without loop shows better perfomance than it does with one.

Using the difference between calls to `time.now()` is not an accurate way to time things. Use the `timeit` module. **Edit**: I am not suggesting that *this* is the reason for the behavior you see. — pault, Aug 20 '18 at 18:05
I don't think that matters a lot, because np.dot takes 2 seconds without loop and I can count them even without any time modules. — Nurislam Fazulzyanov, Aug 20 '18 at 18:07
Using the `timeit` module accounts for a lot of other variables during a benchmark that your code currently does not. The fact that the difference in timings are in the order of seconds is secondary (no pun intended). — code_dredd, Aug 20 '18 at 18:10
Yes. But I don't need to try with other variables, because variables that I chose are similar to those I use in my actual code and as I said, "as I increase dimensionality of arrays it only grows". With small dimensionalities np.dot without loop has a better perfomance — Nurislam Fazulzyanov, Aug 20 '18 at 18:13
The duplicate seems to indicate the first dimensions being smaller is somehow related, which is not the case hear. Also, I concur the `timeit` pedantry is uncalled for, just run the code and count sheep in this case. — kabanus, Aug 20 '18 at 18:14
The timings for these sample arrays show the same sort of pattern as in the proposed duplicate, allowing for changes in implementation in the past several years. The `dot` with a 3d `a` is still noticeably slower. `a@b` is comparable to the loop. Multiple 2d matrix products is exactly the kind of case that `matmul` was intended to handle. — hpaulj, Aug 20 '18 at 19:02
@pault That's definitely good advice—and if you reverse the order of these, the difference is significantly smaller, which implies that whichever one is being run first is getting charged for a big `malloc` that the other one isn't paying for, or something like that. However, even testing with `%%timeit`, there is still a difference, and the loop version is still faster. The OP should definitely rewrite the code to time things properly, but the question seems to be still valid. — abarnert, Aug 20 '18 at 20:01
According to `%%timeit`, on my laptop, `a@b` takes 467ms, `np.dot(a, b)` takes 1501ms, `a.dot(b)` takes 1590ms, and the loop takes 585ms. — abarnert, Aug 20 '18 at 20:02
Also, the two `dot` calls give slightly different answers than the `@` and loop—e.g., `1.7746607483805736` vs. `1.7746607483805732` at `[0, 0, 2]`. — abarnert, Aug 20 '18 at 20:04
Anyway, I think the answer on the proposed dup is the answer here, even if the question isn't obviously the same question. But maybe someone can find a dup that matches the question better? — abarnert, Aug 20 '18 at 20:06
If a.shape[1],a.shape[2] and b.shape[1] are small (as in your case there are faster methods than a@b to get the result. (I measured about an factor of two on implementing it within Numba). Nevertheless your timings looks realy slow in both cases. Which processor do you have and which BLAS backend are you using? — max9111, Aug 21 '18 at 11:17
I have no idea what blas is, but I ran the code in ipython notebook if that matters — Nurislam Fazulzyanov, Aug 21 '18 at 11:49

Why np.dot's perfomance in loop is better than without loop

0 Answers0

Linked