Let's say you have 2 relatively large arrays a
and b
. You could use broadcasting to do a dot
product:
a = np.random.normal(size=(1000, 300))
b = a*2
ref = a.dot(b.T)
res = (a[:,None,:]*b[None,:,:]).sum(2)
np.allclose(ref, res)
# True
timeit -n 1 -r 5 a.dot(b.T)
# 17.6 ms
timeit -n 1 -r 5 (a[:,None,:]*b[None,:,:]).sum(2)
# 1.83 s
Performance difference is roughly 2 orders of magnitude, even larger for bigger arrays.
np.dot
is much faster because it uses specialized libraries, but also because it does not store in memory the full temporal array a[:,None,:]*b[None,:,:]
np.dot
always does a multiplication and then a sum-reduction. I wonder if it would be possible to replace the multiplication operation by any other elementwise operatio (like np.maximum
, ==
, np.power
...) and the sum-reduction by other reduction operation like np.max
. I know this would have niche real-world applications, but I think it could be useful in some cases.
I have tried using numexpr
, but it has limited support for funcs and dtypes.
I have also tried creating an array with dype=object
, with custom multiplication and sum methods, but then you are not dealing with contiguous chunks of memory anymore.
Is there any effective way to accomplish this in numpy?
Ideally some function whith synthax: np.custom_dot(a,b, elementwise_func=my_func, redux_func=my_other_func)