1

Let's say you have 2 relatively large arrays a and b. You could use broadcasting to do a dot product:

a = np.random.normal(size=(1000, 300))
b = a*2

ref = a.dot(b.T)
res = (a[:,None,:]*b[None,:,:]).sum(2)
np.allclose(ref, res)
# True

timeit -n 1 -r 5 a.dot(b.T)
# 17.6 ms    
timeit -n 1 -r 5 (a[:,None,:]*b[None,:,:]).sum(2)
# 1.83 s

Performance difference is roughly 2 orders of magnitude, even larger for bigger arrays. np.dot is much faster because it uses specialized libraries, but also because it does not store in memory the full temporal array a[:,None,:]*b[None,:,:]

np.dot always does a multiplication and then a sum-reduction. I wonder if it would be possible to replace the multiplication operation by any other elementwise operatio (like np.maximum, ==, np.power...) and the sum-reduction by other reduction operation like np.max. I know this would have niche real-world applications, but I think it could be useful in some cases.

I have tried using numexpr, but it has limited support for funcs and dtypes.

I have also tried creating an array with dype=object, with custom multiplication and sum methods, but then you are not dealing with contiguous chunks of memory anymore.

Is there any effective way to accomplish this in numpy?

Ideally some function whith synthax: np.custom_dot(a,b, elementwise_func=my_func, redux_func=my_other_func)

Brenlla
  • 1,471
  • 1
  • 11
  • 23
  • can't you write your own specialized library? Maybe in cython for a start. – hpaulj May 17 '19 at 11:51
  • @hpaulj I have used cython a couple of times for simple things (and often times I get this [error](https://stackoverflow.com/questions/53172601/error-unable-to-find-vcvarsall-bat-when-compiling-cython-code)). I was hoping for a simpler pure `numpy` solution – Brenlla May 17 '19 at 11:59
  • 1
    This sounds like something that numba (https://github.com/numba/numba) would be perfect for. – user545424 May 17 '19 at 13:24
  • 1
    The original `np.einsum` (now embodied in the `c_einsum` function) used cython and `nditer` to perform the sum of products on a complex combination of dimensions. I dug into this years ago when I wrote a patch for the use of ellipsis. At the time there was also a issue request for `einsum` that could use other pairs of operators. The `np.nditer` doesn't help with performance, but as shown on https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html can be a stepping stone toward using it in cython. – hpaulj May 17 '19 at 16:14
  • `np.einsum('ij,kj',a,b, optimize=True)` gives the same timings as your dot. `np.einsum('ij,kj',a,b, optimize=False)` uses the original (nditer) version – hpaulj May 17 '19 at 16:26
  • "np.dot always does a multiplication and then a sum-reduction" At first make sure, that you know how a matrix multiplication is usually implemented. eg. https://gist.github.com/nadavrot/5b35d44e8ba3dd718e595e40184d03f0 How far do you want to go (Register Blocking,Tilling,...)? – max9111 May 28 '19 at 11:33
  • @max9111 That link is pretty interesting. I don't want to go very far, I was hoping for a simple numpy solution. I may do something with cython in the future. – Brenlla May 29 '19 at 11:02

0 Answers0