How can I take a dot product along a particular dimension in numpy?

Question

I have two arrays. One is n by p and the other is d by p by r. I would like my output to be d by n by r, which I can achieve easily as I construct the tensor B below. However, I would like to do this without that loop.

import numpy

X = numpy.array([[1,2,3],[3,4,5],[5,6,7],[7,8,9]]) # n x p
betas = numpy.array([[[1,2],[1,2],[1,2]], [[5,6],[5,6],[5,6]]]) # d x p x r

print X.shape
print betas.shape

B = numpy.zeros((betas.shape[0],X.shape[0],betas.shape[2]))
print B.shape

for i in range(B.shape[0]):
    B[i,:,:] = numpy.dot(X, betas[i])

print "B",B

C = numpy.tensordot(X, betas, axes=([1],[0]))
print C.shape

I have tried in various ways to get C to match B, but so far I have been unsuccessful. Is there a way that does not involve a call to reshape?

score 4 · Answer 1 · edited Jan 30 '18 at 00:32

4

Since the dot rule is 'last of A with 2nd to the last of B', you can do X.dot(betas) and get a (n,d,r) array (this sums on the shared p dimension). Then you just need a transpose to get (d,n,r)

In [200]: X.dot(betas).transpose(1,0,2)
Out[200]: 
array([[[  6,  12],
        [ 12,  24],
        [ 18,  36],
        [ 24,  48]],

       [[ 30,  36],
        [ 60,  72],
        [ 90, 108],
        [120, 144]]])

We can also write the einsum version directly from the dimensions specification:

np.einsum('np,dpr->dnr', X,betas)

So does matmul (this does dot on the last 2 axes, while d comes along for the ride).

X@betas

If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.

edited Jan 30 '18 at 00:32

kmario23

57,311
13
161
150

answered Jan 30 '18 at 00:06

hpaulj

221,503
14
230
353

See my timings.. `np.dot` is faster for small arrays than `tensordot` but it's the other way around for larger arrays. Not sure why. Any ideas? – kmario23 Jan 30 '18 at 00:18
1

Tensordot uses `dot`, just massaging inputs and results to fit (reshape and transpose). – hpaulj Jan 30 '18 at 00:25

score 1 · Accepted Answer · answered Jan 29 '18 at 22:51

1

We can use np.tensordot and then need to permutes axes -

B = np.tensordot(betas, X, axes=(1,1)).swapaxes(1,2)
# Or np.tensordot(X, betas, axes=(1,1)).swapaxes(0,1)

Related post to understand tensordot.

answered Jan 29 '18 at 22:51

Divakar

218,885
19
262
358

And it returns a view, rather than a new array! Wonderful. – Pavel Komarov Jan 29 '18 at 22:56
For larger arrays, `tensordot` is superfast.. but for small arrays `np.dot` is 4x faster than `tensordot`.. not sure why this is the case – kmario23 Jan 30 '18 at 00:16
1

@kmario23 See here for a related discussion - https://stackoverflow.com/a/47646944/ – Divakar Jan 30 '18 at 05:16

kmario23 · Answer 3 · 2018-01-30T00:27:14.810

Here is another approach using numpy.dot(), which also returns a view as you requested, and most importantly more than 4x faster than tensordot approach, particularly for small sized arrays. But, np.tensordot is way faster than plain np.dot() for reasonably larger arrays. See timings below.

In [108]: X.shape
Out[108]: (4, 3)

In [109]: betas.shape
Out[109]: (2, 3, 2)

# use `np.dot` and roll the second axis to first position
In [110]: dot_prod = np.rollaxis(np.dot(X, betas), 1)

In [111]: dot_prod.shape
Out[111]: (2, 4, 2)

# @Divakar's approach
In [113]: B = np.tensordot(betas, X, axes=(1,1)).swapaxes(1,2)

# sanity check :)
In [115]: np.all(np.equal(dot_prod, B))
Out[115]: True

Now, the performance of both approaches:

For small sized arrays np.dot() is 4x faster than np.tensordot()

# @Divakar's approach
In [117]: %timeit B = np.tensordot(betas, X, axes=(1,1)).swapaxes(1,2)
10.6 µs ± 2.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# @hpaulj's approach
In [151]: %timeit esum_dot = np.einsum('np, dpr -> dnr', X, betas)
4.16 µs ± 235 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

# proposed approach: more than 4x faster!!
In [118]: %timeit dot_prod = np.rollaxis(np.dot(X, betas), 1)
2.47 µs ± 11.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

For reasonably larger arrays, np.tensordot() is much faster than np.dot()

In [129]: X = np.random.randint(1, 10, (600, 500))
In [130]: betas = np.random.randint(1, 7, (300, 500, 300))

In [131]: %timeit B = np.tensordot(betas, X, axes=(1,1)).swapaxes(1,2)
18.2 s ± 2.41 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [132]: %timeit dot_prod = np.rollaxis(np.dot(X, betas), 1)
52.8 s ± 14.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Once, you get past the initial setup overhead with `tensordot`, it should be faster than `np.dot` version - `np.rollaxis(np.dot(X, betas), 1)`. So, it doesn't have to be `reasonably larger arrays`. — Divakar, Jan 30 '18 at 05:18
@Divakar I see. So, hpaulj contends that `np.tensordot` uses `np.dot`. I think I'll have to go over the code to get a good grasp of what's happening... I really hope that NumPy devs start implementing support for GPUs well early.. :) — kmario23, Jan 30 '18 at 05:34

How can I take a dot product along a particular dimension in numpy?

3 Answers3