Double dot product with broadcasting in numpy

Question

I have the following operation :

import numpy as np

x = np.random.rand(3,5,5)
w = np.random.rand(5,5)

y=np.zeros((3,5,5))
for i in range(3):
    y[i] = np.dot(w.T,np.dot(x[i],w))

Which corresponds to the pseudo-expression y[m,i,j] = sum( w[k,i] * x[m,k,l] * w[l,j], axes=[k,l] or equivalently simply the dot product of w.T , x, w broadcaster over the first dimension of x.

How can I implement it with numpy's broadcasting rules ?

Thanks in advance.

Divakar · Accepted Answer · 2017-08-16T18:58:06.360

2

Here's one vectorized approach with np.tensordot which should be better than broadcasting + summation anyday -

# Take care of "np.dot(x[i],w)" term
x_w = np.tensordot(x,w,axes=((2),(0)))

# Perform "np.dot(w.T,np.dot(x[i],w))" : "np.dot(w.T,x_w)"
y_out = np.tensordot(x_w,w,axes=((1),(0))).swapaxes(1,2)

Alternatively, all of the mess being taken care of with one np.einsum call, but could be slower -

y_out = np.einsum('ab,cae,eg->cbg',w,x,w)

Runtime test -

In [114]: def tensordot_app(x, w):
     ...:     x_w = np.tensordot(x,w,axes=((2),(0)))
     ...:     return np.tensordot(x_w,w,axes=((1),(0))).swapaxes(1,2)
     ...: 
     ...: def einsum_app(x, w):
     ...:     return np.einsum('ab,cae,eg->cbg',w,x,w)
     ...: 

In [115]: x = np.random.rand(30,50,50)
     ...: w = np.random.rand(50,50)
     ...: 

In [116]: %timeit tensordot_app(x, w)
1000 loops, best of 3: 477 µs per loop

In [117]: %timeit einsum_app(x, w)
1 loop, best of 3: 219 ms per loop

Giving the broadcasting a chance

The sum-notation was -

y[m,i,j] = sum( w[k,i] * x[m,k,l] * w[l,j], axes=[k,l] )

Thus, the three terms would be stacked for broadcasting, like so -

w : [ N x k x i x N x N]
x : [ m x k x N x l x N]
w : [ N x N X N x l x j]

, where N represents new-axis being appended to facilitate broadcasting along those dims.

The terms with new axes being added with None/np.newaxis would then look like this -

w : w[None, :,    :,    None, None]
x : x[:,    :,    None, :,    None]
w : w[None, None, None, :,       :]

Thus, the broadcasted product would be -

p = w[None,:,:,None,None]*x[:,:,None,:,None]*w[None,None,None,:,:]

Finally, the output would be sum-reduction to lose (k,l), i.e. axes =(1,3) -

y = p.sum((1,3))

edited Aug 16 '17 at 18:58

answered Aug 16 '17 at 16:32

Divakar

218,885
19
262
358

Cool ! I know `tensordot`is faster for dot products than broadcasting, but wouldn't a single sum + broadcasting be faster than two `tensordots`? – Toool Aug 16 '17 at 16:44
Also, do you know any reference where it is clearly explained how to convert a given tensor operation, expressed with indices, into a numpy sum-broadcasting operation ? I know a little about how to do it but I can't figure out a general way to proceed :) – Toool Aug 16 '17 at 16:46
@Tool Tensordot avoids the expansion and does the sum-reduction through BLAS calls, and as such are much faster than broadcasting + sum-reductions. – Divakar Aug 16 '17 at 16:47
@Tool There isn't a general way. We need to keep track of the axes that are undergoing sum-reductions and use those under the axes arg for `np.tensordot`. I can at least point to another Q&A that sort of explains using `tensordot` if you are having trouble understanding it - https://stackoverflow.com/questions/41870228/ – Divakar Aug 16 '17 at 16:51
Thank you for this reference (I knew it but I'll need to read it again). I may have expressed myself badly, I am looking for a general way to bridge from a given mathematical tensor operation to the equivalent numpy implementation with broadcasting-sum-reductions, since I think every given tensor operation can be implemented this way. I'll probably post a new question for this purpose, as it may be clearer. – Toool Aug 16 '17 at 16:56
@Tool In the broadcasted form, you have stated : `sum(w[k,i]*x[k,l]*w[l,j],axes=[k,l])`. But `x` is 3D there. – Divakar Aug 16 '17 at 17:12
The correct pseudo-expression is `y[m,i,j] = sum( w[k,i] * x[m,k,l] * w[l,j], axes=[k,l]`. I've corrected the post as well – Toool Aug 16 '17 at 17:18
That is exactly the methodology I was looking for ! Very neat, thanks. – Toool Aug 16 '17 at 18:39

Double dot product with broadcasting in numpy

1 Answers1