I don't think your question is unclear, but rather overly pedantic.
For example, why are you puzzled by sum product
in this nD by 1d case, when the docs use inner product
for the 1d by 1d case, and matrix product
in the 2d by 2d case? Give yourself some freedom to read it as sum of the products
, as done in the inner product.
To make your example clearer, make w
rectangular, to better distinguish row actions from column ones:
In [168]: w=np.array([[1,2,3],[4,5,6]])
...: x=np.array([7,2,3])
...:
...:
In [169]: w.shape
Out[169]: (2, 3)
In [170]: x.shape
Out[170]: (3,)
The dot
and its equivalent einstein
notation:
In [171]: np.dot(w,x)
Out[171]: array([20, 56])
In [172]: np.einsum('ij,j->i',w,x)
Out[172]: array([20, 56])
The sum of the products
is being done on the repeated j
dimension, without summation on i
.
We can do the same thing with broadcasted elementwise multiplication:
In [173]: (w*x[None,:]).sum(axis=1)
Out[173]: array([20, 56])
While this equivalent operation does use broadcasting, it's better not to think of dot
in those terms.
matmul
gives another description of the same action, adding a dimension to x
to form a 2d by 2d matrix product, followed by a squeeze to remove the extra dimension. I don't think dot
does that under the covers, but the result is the same.
This may also be called matrix vector multiplication, provided you don't insist on calling the 1d x
a row vector or column vector.
Now for a 2d x
, with shape (3,1):
In [175]: x2 = x[:,None]
In [176]: x2
Out[176]:
array([[7],
[2],
[3]])
In [177]: x2.shape
Out[177]: (3, 1)
In [178]: np.dot(w,x2)
Out[178]:
array([[20],
[56]])
In [179]: np.einsum('ij,jk->ik',w,x2)
Out[179]:
array([[20],
[56]])
The sum is over j
, the last axis of w
, and 2nd to the last of x
. To do the same with elementwise we have to use broadcasting to generate a 3d outer
product, and then do the sum to reduce the dimension back to 2.
In [180]: (w[:,:,None]*x2[None,:,:]).sum(axis=1)
Out[180]:
array([[20],
[56]])
In this example a (2,3) dot (3,1) => (2,1)
. That's perfectly normal matrix product behavior. In the first (2,3) dot (3,) => (2,)
. To me this is a logical generalization. (3,) dot (3,) => scalar
(as opposed to ()` is a bit more of a special case.
I suspect the first case is mainly a problem for people who see a (3,) shape and think (1,3), a row-vector. (2,3) dot (1,3)
doesn't work, because of the mismatch between the 3 and the 1.