Let's call x_1
and x_2
the two inputs:
np.einsum('bijc,bijd->bcd', x_1, x_2)
bijc,bijd->bcd
boils down to ijc,ijd->cd
since the first dimension is not used. Imagine you have c
channels of ixj
on one hand, and d
channels of ixj
on the other. The result we're looking for is a cxb
matrix. Combining each ixj
layer from x_1
(there are c
in total) to each ixj
layer from x_2
(there are d
in total) makes a total of c*d
values, this is what we're looking for. It's actually the sum of what's called the Hadamard product between the two ixj
layers.
Since c
is first, c
will be in the first dim (the rows) while d
will be the number of columns.
Here's an idea:
b_s, i_s, j_s, c_s = x_1.shape
d_s = x_2.shape[3]
y = np.zeros((b_s, c_s, d_s))
for b in range(b_s):
for i in range(i_s):
for j in range(j_s):
for c in range(c_s):
for d in range(d_s):
y[b, c, d] += x_1[b, i, j, c]*x_2[b, i, j, d]
This post might give you a better idea
Also try the following to see what happens in a simple case with i=2
, j=2
, c=1
and d=1
:
a = [[[1], [0]], [[0], [1]]]; b = [[[4], [1]], [[2], [2]]]
np.einsum('ijc,ijd->cd', a, b)
result is a d*c
matrix of size... 1x1
(since c
and d
are both equal to 1). Here the result is [6]