Fastest way to get dot product of a matrix and each point in the given array

Question

I have an array Nx3 of N points, each one has X, Y and Z coordinate. I need to rotate each point, so i have the rotation matrix 3x3. To do this, i need to get dot product of the rotation matrix and each point. The problem is the array of points is quite massive (~1_000_000x3) and therefore it takes noticeable amount of time to compute the rotated points.

The only solution i come up with so far is simple for loop iterating over array of points. Here is the example with a piece of data:

# Given set of points Nx3: np.ndarray
points = np.array([
     [285.679, 219.75,  524.733],
     [285.924, 219.404, 524.812],
     [285.116, 219.217, 524.813],
     [285.839, 219.557, 524.842],
     [285.173, 219.507, 524.606]
        ])
points_t = points.T 
# Array to store results
points_rotated = np.empty((points_t.shape))
# Given rotation matrix 3x3: np.ndarray
rot_matrix = np.array([
        [0.57423549, 0.81862056, -0.01067613],
        [-0.81866133, 0.57405696, -0.01588193],
        [-0.00687256, 0.0178601, 0.99981688]
            ])

for i in range(points.shape[0]):
    points_rotated[:, i] = np.dot(rot_matrix, points_t[:, i])

# Result
points_rotated.T 
# [[ 338.33677093 -116.05910163  526.59831864]
#  [ 338.1933725  -116.45955203  526.6694408 ]
#  [ 337.5762975  -115.90543822  526.67265381]
#  [ 338.26949115 -116.30261156  526.70275207]
#  [ 337.84863885 -115.78233784  526.47047941]]

I dont feel confident using numpy as i quite new to it, so i'm sure there is at least more elegant way to do. I've heard about np.einsum() but i don't understand yet how to implement it here and i'm not sure it will make it quicker. And the main problem is still the computation time, so i want to now how to make it work faster?

Thank you very much in advance!

`points @ rot_matrix.T`, `np.einsum('ij,kj', points, rot_matrix)` is not faster for simple dot products. — Michael Szczesny, Jun 11 '22 at 12:43
Why don't combine both numba and (rot_matrix @ points.T).T together? I guess it will still give a descent speed up by combining this two codes. Or better, if you know Cython, you will see the tremendous improvement in the speed! — , Jun 11 '22 at 16:04

Ali_Sh · Accepted Answer · 2022-06-11T15:51:26.427

1

You can write the code using numba parallelization in no python mode as:

@nb.njit("float64[:, ::1](float64[:, ::1], float64[:, ::1])", parallel=True)
def dot(a, b):  # dot(points, rot_matrix)

    dot_ = np.zeros((a.shape[0], b.shape[0]))
    for i in nb.prange(a.shape[0]):
        for j in range(b.shape[0]):
            dot_[i, j] = a[i, 0] * b[j, 0] + a[i, 1] * b[j, 1] + a[i, 2] * b[j, 2]
    return dot_

It must be better than ko3 answer due to parallelization and signatures and using algebra instead np.dot. I have tested similar code (that was applied just on upper triangle of an array) instead dot product in another answer which showed that was at least 2 times faster (as far as I can remember).

edited Jun 11 '22 at 15:51

answered Jun 11 '22 at 15:46

Ali_Sh

2,667
3
43
66

1

This solution is already ~22x faster with only 2 cores, nearly as fast as `np.matmul`. This is a much bigger improvement than I would have guessed. – Michael Szczesny Jun 11 '22 at 16:01
With your parallelization code, in my system for 1,000,000 x 3 matrix, your code takes 3 ms less than all other methods :D – Jun 11 '22 at 16:34
1

Highly optimized compiled with SIMD instructions (`vmulpd` and `vaddpd`). – Michael Szczesny Jun 11 '22 at 16:36

score 0 · Answer 2 · answered Jun 11 '22 at 16:31

I have tried some codes to see the performance of each solution (based on my system) on 1,000,000 x 3 matrix.

Using np.matmul
Using numba njit with dot product(@)
Using numba njit with your code

Here are the results of three functions.

points = np.random.rand(1000000,3)
rot_matrix = np.array([
        [0.57423549, 0.81862056, -0.01067613],
        [-0.81866133, 0.57405696, -0.01588193],
        [-0.00687256, 0.0178601, 0.99981688]


# Using np.matmul
def function_matmul(points, rot_matrix):
    return np.matmul(rot_matrix @ points.T).T

# Using numba njit with dot product(@)
@njit
def function_numba_dot(points, rot_matrix):
    return (rot_matrix, points.T).T

# Using numba njit with your code
@njit
def function_ori_code(points, rot_matrix):
    points_t = points.T 
    points_rotated = np.empty((points_t.shape))
    for i in range(points.shape[0]):
        points_rotated[:, i] = np.dot(rot_matrix, points_t[:, i])    
    return points_rotated.T 


%timeit function_matmul(points, rot_matrix)
16.5 ms ± 665 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit function_numba_dot(points, rot_matrix)
16.3 ms ± 69.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit function_ori_code(points, rot_matrix)
260 ms ± 4.37 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Have done some research in SO. You can check out this post that utilizes `numexpr` to do the numpy operation. https://stackoverflow.com/a/51949588/16836078 — , Jun 11 '22 at 16:49
Thanks, thats a useful comparison. I'm actually so confused i missed the fact that i dont even need to iterate over array of points with `np.dot` and `np.matmul`... — onthebox, Jun 13 '22 at 16:29

Fastest way to get dot product of a matrix and each point in the given array

2 Answers2

Linked