13

I have two 2-d numpy arrays with the same dimensions, A and B, and am trying to calculate the row-wise dot product of them. I could do:

np.sum(A * B, axis=1)

Is there another way to do this so that numpy is doing the row-wise dot product in one step rather than two? Maybe with tensordot?

Roger Fan
  • 4,945
  • 31
  • 38
Neil G
  • 32,138
  • 39
  • 156
  • 257
  • Are A and B matrices? I'm assuming with identical dimensions? – Roger Fan Oct 02 '14 at 19:56
  • [Note that "matrix" is a technical word in numpy-speak, and differs from "array": if A and B are arrays, not matrices, then `A*B` is an elementwise product, not a dot product.] – DSM Oct 02 '14 at 19:58
  • 1
    @DSM: Yes, it's elementwise. Looking forward to the `@` operator so that we can forget about `array` :) – Neil G Oct 02 '14 at 20:05

3 Answers3

12

This is a good application for numpy.einsum.

a = np.random.randint(0, 5, size=(6, 4))
b = np.random.randint(0, 5, size=(6, 4))

res1 = np.einsum('ij, ij->i', a, b)
res2 = np.sum(a*b, axis=1)

print(res1)
# [18  6 20  9 16 24]

print(np.allclose(res1, res2))
# True

einsum also tends to be a bit faster.

a = np.random.normal(size=(5000, 1000))
b = np.random.normal(size=(5000, 1000))

%timeit np.einsum('ij, ij->i', a, b)
# 100 loops, best of 3: 8.4 ms per loop

%timeit np.sum(a*b, axis=1)
# 10 loops, best of 3: 28.4 ms per loop
Roger Fan
  • 4,945
  • 31
  • 38
4

Even faster is inner1d from numpy.core.umath_tests:

enter image description here


Code to reproduce the plot:

import numpy
from numpy.core.umath_tests import inner1d
import perfplot


perfplot.show(
        setup=lambda n: (numpy.random.rand(n, 3), numpy.random.rand(n, 3)),
        kernels=[
            lambda a: numpy.sum(a[0]*a[1], axis=1),
            lambda a: numpy.einsum('ij, ij->i', a[0], a[1]),
            lambda a: inner1d(a[0], a[1])
            ],
        labels=['sum', 'einsum', 'inner1d'],
        n_range=[2**k for k in range(20)],
        xlabel='len(a), len(b)',
        logx=True,
        logy=True
        )
Nico Schlömer
  • 53,797
  • 27
  • 201
  • 249
-1

Even though it is significantly slower for even moderate data sizes, I would use

np.diag(A.dot(B.T))

while you are developing the library and worry about optimizing it later when it will run in a production setting, or after the unit tests are written.

To most people who come upon your code, this will be more understandable than einsum, and also doesn't require you to break some best practices by embedding your calculation within a mini DSL string to serve as the argument to some function call.

I agree that computing the off-diagonal elements is worth avoiding for large cases. It would have to be really really large for me to care about that though, and the trade-off for paying the awful price of expressing the calculating in an embedded string in einsum is pretty severe.

ely
  • 74,674
  • 34
  • 147
  • 228
  • 1
    Using the example in my question, this solution takes almost 100x as long as the `einsum` example and about 20x as long as the `np.sum(A * B, axis=1)` one. I think `einsum` is a powerful and expressive tool, but if you really don't like it then you should use the solution in the question over this. – Roger Fan Oct 02 '14 at 20:29
  • Two orders of magnitude would be worth it to me to avoid the way the calculation is expressed in `einsum` in all but the most severe of production environments. I might also consider what needs to be done with the resulting vector of sums and write a generator with a stupid pure Python for loop performing the use of `.dot` manually on each pair of rows. If I only needed to consume the output once, that would be even better. – ely Oct 02 '14 at 20:31
  • +1 for the good point, although I prefer either of the more efficient solutions. It can be hard to debug when your input becomes big and all of a sudden everything is really slow. – Neil G Oct 02 '14 at 20:34
  • I'm getting over half a second for two 5000x1000 arrays. That's not even close to a large-scale problem, and that's already pretty significant amount of time. Either way, there's absolutely no advantage to this over the `np.sum(A*B, axis=1)` method. That also avoids `einsum` and is 20x faster. – Roger Fan Oct 02 '14 at 20:34
  • I agree that `einsum` is opaque. Thing is, I can always write the efficient solution and then add a comment. I also have a design document, which explains everything I'm doing. – Neil G Oct 02 '14 at 20:37
  • @NeilG I actually don't think `einsum` is opaque, you just need to know [Einstein summation notation](http://en.wikipedia.org/wiki/Einstein_notation). If you're comfortable with that, `einsum` is a powerful, fast, and extremely expressive way to perform a large set of matrix operations. – Roger Fan Oct 02 '14 at 20:40
  • 4
    If you dislike strings so much, you can also tell einsum to do its thing, for the OP's use case, as: `np.einsum(a, [0,1], b, [0,1], [0])`. It would be good if you could specify a few of the "so many reasons" that make `einsum` such a bad idea. As is, I am strongly tempted to downvote this for giving bad advice... – Jaime Oct 02 '14 at 20:41
  • Embedding a calculation into an obfuscated string format is bad because it hurts modularity and increases hardcoding. To detect whether there is a problem in the calculation, it's not feasible to break it up into smaller algebraic steps, as it would be with chained application of basic liner algebra steps or canned functions. If you've got an index in the wrong place, that may be entirely non-obvious to someone reading the code and very hard to debug. – ely Oct 02 '14 at 20:45
  • In a sense, it shares a lot of the downsides of using something like `eval` -- the code that gets fed to `eval` becomes a magic constant as far as users of the code are concerned and possibly also a parameter that has to be understood and kept track of to understand or debug the program. – ely Oct 02 '14 at 20:53
  • 2
    @EMS: Isn't *all code* "a string constant that meant something to the person writing the code"? I don't see what's wrong with just adding a comment to explain unobvious code. – Neil G Oct 02 '14 at 21:03
  • It's really disingenuous to draw that comparison. For example, suppose that instead of offering its API with functions, `NumPy` offered just one function, `np.eval` but it allowed you to use keywords inside a big string that it would interpret as math functions. So you could say, `x = np.eval('g = sin(y); x = cos(g);')` and it would look up `y` in the local namespace, apply the functions, and return whatever `x` got bound to within the string context back into the namespace as the variable named `x`. Then suppose someone accidentally typed `sign` instead of `sin`. – ely Oct 02 '14 at 21:07
  • You mean because you're not getting line numbers? Python's eval could theoretically provide that. Everything in Python can be theoretically done at runtime, so this distinction is IMHO a leftover from what I guess is your background with statically compiled languages. :) – Neil G Oct 02 '14 at 21:32
  • EMS: you opinion is wrong. einsum is a godsend to writing readable and bugfree linear products. explicitly specifying axes clarifies intent both to other programmers as well as the computer. – Eelco Hoogendoorn Oct 02 '14 at 21:41
  • 1
    prpl: sorry for the necro. Of course I agree that DSL strings have their drawbacks; expressing everything as an expression graph in the native language is of course much cleaner. That is, if the native language permits it; if expressing your statement into the native language feels a lot like fitting a square peg into a round hole, a little DSL may not be so bad. And as far as linear algebra is concerned; its a DSL that the people that care about a function like einsum should already know and love. But the fact that the einsum syntax forces one to explicitly name all the axes is a good thing. – Eelco Hoogendoorn Dec 27 '14 at 11:09