Large point-matrix array multiplication in numpy

Question

Given two large numpy arrays, one for a list of 3D points, and another for a list of transformation matrices. Assuming there is a 1 to 1 correspondence between the two lists, i'm looking for the best way to calculate the result array of each point transformed by it's corresponding matrix.

My solution to do this was to use slicing (see "test4" in the example code below) which worked fine with small arrays, but fails with large arrays because of how memory-wasteful my method is :)

import numpy as np
COUNT    = 100
matrix   = np.random.random_sample((3,3,))   # A single matrix
matrices = np.random.random_sample((COUNT,3,3,)) # Many matrices
point    = np.random.random_sample((3,))     # A single point
points   = np.random.random_sample((COUNT,3,)) # Many points

# Test 1, result of a single point multiplied by a single matrix
# This is as easy as it gets
test1 = np.dot(point,matrix)
print 'done'

# Test 2, result of a single point multiplied by many matrices
# This works well and returns a transformed point for each matrix
test2 = np.dot(point,matrices)
print 'done'

# Test 3, result of many points multiplied by a single matrix
# This works also just fine
test3 = np.dot(points,matrix)
print 'done'

# Test 4, this is the case i'm trying to solve. Assuming there's a 1-1
# correspondence between the point and matrix arrays, the result i want
# is an array of points, where each point has been transformed by it's
# corresponding matrix
test4 = np.zeros((COUNT,3))
for i in xrange(COUNT):
    test4[i] = np.dot(points[i],matrices[i])
print 'done'

With a small array, this works fine. With large arrays, (COUNT=1000000) Test #4 works but gets rather slow.

Is there a way to make Test #4 faster? Presuming without using a loop?

@WarrenWeckesser correct, and my apologies for this glaring mistake. I wrote this with two screaming kids running around me and must have copy pasted wrong :) — Fnord, Oct 02 '15 at 06:24

Warren Weckesser · Accepted Answer · 2015-10-02T06:08:46.003

You can use numpy.einsum. Here's an example with 5 matrices and 5 points:

In [49]: matrices.shape
Out[49]: (5, 3, 3)

In [50]: points.shape
Out[50]: (5, 3)

In [51]: p = np.einsum('ijk,ik->ij', matrices, points)

In [52]: p[0]
Out[52]: array([ 1.16532051,  0.95155227,  1.5130032 ])

In [53]: matrices[0].dot(points[0])
Out[53]: array([ 1.16532051,  0.95155227,  1.5130032 ])

In [54]: p[1]
Out[54]: array([ 0.79929572,  0.32048587,  0.81462493])

In [55]: matrices[1].dot(points[1])
Out[55]: array([ 0.79929572,  0.32048587,  0.81462493])

The above is doing matrix[i] * points[i] (i.e. multiplying on the right), but I just reread the question and noticed that your code uses points[i] * matrix[i]. You can do that by switching the indices and arguments of einsum:

In [76]: lp = np.einsum('ij,ijk->ik', points, matrices)

In [77]: lp[0]
Out[77]: array([ 1.39510822,  1.12011057,  1.05704609])

In [78]: points[0].dot(matrices[0])
Out[78]: array([ 1.39510822,  1.12011057,  1.05704609])

In [79]: lp[1]
Out[79]: array([ 0.49750324,  0.70664634,  0.7142573 ])

In [80]: points[1].dot(matrices[1])
Out[80]: array([ 0.49750324,  0.70664634,  0.7142573 ])

score 0 · Answer 2 · edited Apr 13 '17 at 12:19

0

It doesn't make much sense to have multiple transform matrices. You can combine transform matrices as in this question:

If I want to apply matrix A, then B, then C, I will multiply the matrices in reverse order np.dot(C,np.dot(B,A))

So you can save some memory space by precomputing that matrix. Then applying a bunch of vectors to one transform matrix should be easily handled (within reason).

I don't know why you would need one million transformation on one million vectors, but I would suggest buying a larger RAM.

Edit: There isn't a way to reduce the operations, no. Unless your transformation matrices hold a specific property such as sparsity, diagonality, etc. you're going to have to run all multiplications and summations. However, the way you process these operations can be optimized across cores and/or using vector operations on GPUs.

Also, python is notably slow. You can try splitting numpy across your cores using NumExpr. Or maybe use a BLAS framework on C++ (notably quick ;))

edited Apr 13 '17 at 12:19

Community

1
1

answered Oct 02 '15 at 04:23

Aidan Gomez

8,167
5
28
51

See my edit to test #4 and my last paragraph. This should clarify things a little. – Fnord Oct 02 '15 at 04:47
@Fnord by 1-1 correspondence, are you asserting that they're of the same length? – Aidan Gomez Oct 02 '15 at 04:50
yes, test #4 would be a case where each point of one list is to be multiplied by a matrix of corresponding index in the other list. – Fnord Oct 02 '15 at 06:02
Warren's answer is ~33x faster than mine, so i'll take it :) Thanks for your input! – Fnord Oct 02 '15 at 17:28
@Fnord Wow! That's great to hear. Would love to see where exactly it sped things up? – Aidan Gomez Oct 02 '15 at 18:04
@Fnord Oh cool, check this out: http://stackoverflow.com/questions/18365073/why-is-numpys-einsum-faster-than-numpys-built-in-functions! So, einsum is rather new and apparently highly optimized in comparison to legacy numpy methods. – Aidan Gomez Oct 02 '15 at 18:14

Large point-matrix array multiplication in numpy

2 Answers2