11

Hi I'm relatively new here and trying to do some calculations with numpy. I'm experiencing a long elapse time from one particular calculation and can't work out any faster way to achieve the same thing.

Basically its part of a ray triangle intersection algorithm and I need to calculate all the vector cros products from two matrices of different sizes.

The code I was using was :

allhvals1 = numpy.cross( dirvectors[:,None,:], trivectors2[None,:,:] )

where dirvectors is an array of n* vectors (xyz) and trivectors2 is an array of m*vectors(xyz). allhvals1 is an array of the cross products of size n*M*vector (xyz). This works but is very slow. It's essentially the n*m matrix of each vector from each array. Hope that you understand. The sizes of each varies from approx 1-4000 depending on parameters (I basically chunk the dirvectors dependent on size).

Any advice appreciated. Unfortunately my matrix math is somewhat flakey.

user1942439
  • 157
  • 1
  • 8
  • 1
    Not to be that guy but, this isn't a forum :) I'm mentioning it because there are too many people treating this site like a forum. There's nothing wrong with your question though. – keyser Jan 03 '14 at 16:57
  • It's possible that a large fraction of the time is spent in extracting the vectors, rather than in the cross product. I would try extracting them into variables ahead of doing the product. Then I would use [*this technique*](http://stackoverflow.com/a/4299378/23771) to get better insight. – Mike Dunlavey Sep 23 '16 at 13:20

2 Answers2

20

If you look at the source code of np.cross, it basically moves the xyz dimension to the front of the shape tuple for all arrays, and then has the calculation of each of the components spelled out like this:

x = a[1]*b[2] - a[2]*b[1]
y = a[2]*b[0] - a[0]*b[2]
z = a[0]*b[1] - a[1]*b[0]

In your case, each of those products requires allocating huge arrays, so the overall behavior is not very efficient.

Lets set up some test data:

u = np.random.rand(1000, 3)
v = np.random.rand(2000, 3)

In [13]: %timeit s1 = np.cross(u[:, None, :], v[None, :, :])
1 loops, best of 3: 591 ms per loop

We can try to compute it using Levi-Civita symbols and np.einsum as follows:

eijk = np.zeros((3, 3, 3))
eijk[0, 1, 2] = eijk[1, 2, 0] = eijk[2, 0, 1] = 1
eijk[0, 2, 1] = eijk[2, 1, 0] = eijk[1, 0, 2] = -1

In [14]: %timeit s2 = np.einsum('ijk,uj,vk->uvi', eijk, u, v)
1 loops, best of 3: 706 ms per loop

In [15]: np.allclose(s1, s2)
Out[15]: True

So while it works, it has worse performance. The thing is that np.einsum has trouble when there are more than two operands, but has optimized pathways for two or less. So we can try to rewrite it in two steps, to see if it helps:

In [16]: %timeit s3 = np.einsum('iuk,vk->uvi', np.einsum('ijk,uj->iuk', eijk, u), v)
10 loops, best of 3: 63.4 ms per loop

In [17]: np.allclose(s1, s3)
Out[17]: True

Bingo! Close to an order of magnitude improvement...

Some performance figures for NumPy 1.11.0 with a=numpy.random.rand(n,3), b=numpy.random.rand(n,3):

enter image description here

The nested einsum is about twice as fast as cross for the largest n tested.

Nico Schlömer
  • 53,797
  • 27
  • 201
  • 249
Jaime
  • 65,696
  • 17
  • 124
  • 159
  • By the way, hat tip to @hpaulj's [comment from yesterday](http://stackoverflow.com/questions/20889514/special-tensor-contraction-in-python/20890335#comment31355353_20889514) that reminded me what Levi-Civita symbols are all about. – Jaime Jan 03 '14 at 18:32
  • I had not known about `np.einsum` until now. Very cool. And now I'm disappointed that the only college class of mine that mentioned the Levi-Civita symbol mentioned it only in passing when discussing various ways of representing the cross product (or perhaps it was something about permutations of matrix rows or columns in general; we never returned to them afterwards so I don't remember very well). That's what I get for being an engineer rather than a mathematician, I suppose. – JAB Jan 03 '14 at 18:33
  • I'll give that a try and see how it performs. I had looked at einsum but didn't really understand enough on it. – user1942439 Jan 03 '14 at 18:51
  • Just to confirm this worked nicely. Still got some more speed ups to do on the rest of this routine but this certainly helped a lot. – user1942439 Jan 03 '14 at 20:12
  • @user1942439 If you find this answers your question, you may want to consider accepting it by clicking the checkmark next to it... – Jaime Jan 03 '14 at 21:19
  • good one; I would have figured np.cross was smarter than that – Eelco Hoogendoorn Apr 26 '16 at 09:21
  • The implementation of cross has changed after this question was posted, should do a better job of not allocating unnecessary intermediate storage now. – Jaime Apr 26 '16 at 09:24
  • `np.einsum` is now as fast with three operands as it is with the nested formulation when using `optimize = True` – Daniel F Sep 14 '18 at 06:27
0

While writing dynamic simulations for underwater vehicles I have found this method for fast cross product:

https://github.com/simena86/Simulink-Underwater-Robotics-Simulator/blob/master/3rdparty/gnc_mfiles/Smtrx.m

Which works well, it is written in Matlab but the code is very simple. Just read the comments at the top.

  • 1
    It is better to translate the Matlab codes into python codes. – zangw Apr 26 '16 at 07:56
  • While a link is great, please [provide context to your link(s)](http://meta.stackoverflow.com/a/8259/169503), as an *answer* should actually contain an answer. Be aware that being *barely more than a link to an external site* is a possible reason as to [Why and how are some answers deleted?](http://stackoverflow.com/help/deleted-answers). – Jan Apr 26 '16 at 07:56