1

I need to multiply two 3D arrays in an usual way. If needed to accomplish my task, I can ''afford'' to permute (change their shape) as I need as they are pretty small in size (less than (1_000, 200, 200) of np.complex128).

At the moment, I have the following inefficient triple nested for-loop:

import numpy as np
result = np.zeros( (640, 39, 20) )
a = np.random.rand(640, 640, 20)
b = np.random.rand(39, 640, 20)

for j in range(640):
    for m in range(39):
        for l in range(20):
            result[j, m, l] = (a[j, :, l] * b[m, :, l]).sum()

How can I make the above as efficient as possible using numpy's magic?

I know I could use numba and hope that I beat numpy by using compiled code and parallel=True, but I want to see if numpy suffices for my task.

EDIT: Does it work for a more complex inner for loop as below?

for l in range(20):
    for m in range(-l, l+1, 1):
        for j in range(640):
            result[j, m, l] = (a[j, :, l] * b[m, :, l]).sum()

After @hpaulj comment, I now understand that the above is not possible.

Thank you!

velenos14
  • 534
  • 4
  • 13
  • 4
    `np.einsum('jkl,mkl->jml', a, b)` – Michael Szczesny Jul 28 '22 at 10:31
  • thank you, I will read about ``np.einsum()``, but at a first glance it seems that ``np.einsum('jkl,mkl->jml', a, b) == result`` does not produce only ``True``'s inside the obtained boolean array – velenos14 Jul 28 '22 at 10:36
  • 1
    I compared the results with `np.testing.assert_allclose` for the provided example data. Checking equality of arrays of type `np.float` is not reliable. – Michael Szczesny Jul 28 '22 at 10:39
  • hmm, that's interesting, so the obtained numbers are indeed equal and the operations are similar, but why the computer does not produce the exact same floats in both ways? is it that ``np.einsum()`` does other manipulations (prob. more optimized) behind the scenes than my for-loops above and the ``.sum()``? – velenos14 Jul 28 '22 at 10:42
  • 1
    For example, the `sum` is calculated iteratively, not at the end. – Michael Szczesny Jul 28 '22 at 10:43
  • 3
    FP math is *not associative*. Using SIMD instructions can slightly change the result. Multithreading also cause this. The result in this case is generally a bit more precise. Einsum is not guaranteed to give the exact same result. In fact, `np.sum` uses a pairwise algorithm and is far to give the best precision. A naive loop should give a pretty bad accuracy compared to all other methods. The most accurate methods are very slow and complex. This is why Numpy does not use them (nor nearly any program) See [Is floating point math broken?](https://stackoverflow.com/questions/588004). – Jérôme Richard Jul 28 '22 at 14:25
  • @MichaelSzczesny, would you mind having a look at a more complex indexing I included in the updated question? Thank you – velenos14 Jul 28 '22 at 14:46
  • 2
    Variable loop ranges like `range(-l, l+1, 1)` cannot be handled with whole-array methods. – hpaulj Jul 28 '22 at 15:08

0 Answers0