1

The Problem:

I want to calculate the dot product of a very large set of data. I am able to do this in a nested for-loop, but this is way too slow. Here is a small example:

import numpy as np

points = np.array([[0.5, 2, 3, 5.5, 8, 11], [1, 2, -1.5, 0.5, 4, 5]])
lines = np.array([[0, 2, 4, 6, 10, 10, 0, 0], [0, 0, 0, 0, 0, 4, 4, 0]])
x1 = lines[0][0:-1]
y1 = lines[1][0:-1]
L1 = np.asarray([x1, y1])

# calculate the relative length of the projection
# of each point onto each line
a = np.diff(lines)
b = points[:,:,None] - L1[:,None,:]
print(a.shape)
print(b.shape)

[rows, cols, pages] = np.shape(b)
Z = np.zeros((cols, pages))
for k in range(cols):
    for l in range(pages):
        Z[k][l] = a[0][l]*b[0][k][l] + a[1][l]*b[1][k][l]

N = np.linalg.norm(a, axis=0)**2
relativeProjectionLength = np.squeeze(np.asarray(Z/N))

In this example, the first two dimensions of both a and b represent the x- and y-coordinates that I need for the dot product. The shape of a is (2,7) and b has (2,6,7). Since the dot product reduces the first dimension I would expect the result to be of the shape (6,7). How can I calculate this without the slow loops?

What I have tried:

I think that numpy.dot with correct broadcasting could do the job, however I have trouble setting up the dimensions correctly.

a = a[:, None, :]
Z = np.dot(a,b)

This on gives me the following error:

shapes (2,1,7) and (2,6,7) not aligned: 7 (dim 2) != 6 (dim 1)

  • The kind of broadcasting that works with element wise operations like `+` and `*` does not apply to `np.dot`. It has its own rules for matching axes. – hpaulj Nov 06 '17 at 16:59

2 Answers2

3

You can use np.einsum -

np.einsum('ij,ikj->kj',a,b)

Explanation :

  • Keep the last axes aligned for the two inputs.

  • Sum-reduce the first from those.

  • Let the rest stay, which is the second axis of b.

Usual rules on whether to use einsum or stick to a loopy-dot based method apply here.

Divakar
  • 218,885
  • 19
  • 262
  • 358
  • 1
    Hello, thank you for the answer. It did the job perfectly. I find the notation of np.einsum very intuitive, as one sees directly which dimensions are reduced! The runtime analysis of different approaches will also make it to my bookmarks. – Olbert Dämmerer Nov 13 '17 at 13:39
1

numpy.dot does not reduce the first dimension. From the docs:

For N dimensions it is a sum product over the last axis of a and the second-to-last of b:

dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])

That is exactly what the error is telling you: it is attempting to match axis 2 in the first vector to axis 1 in the second.

You can fix this using numpy.rollaxis or better yet numpy.moveaxis. Instead of a = a[:, None, :], do

a = np.movesxis(a, 0, -1)
b = np.moveaxis(b, 0, -2)
Z = np.dot(a, b)

Better yet, you can construct your arrays to have the correct shape up front. For example, transpose lines and do a = np.diff(lines, axis=0).

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264