Here is another approach using numpy.dot()
, which also returns a view as you requested, and most importantly more than 4x faster than tensordot approach, particularly for small sized arrays. But, np.tensordot
is way faster than plain np.dot()
for reasonably larger arrays. See timings below.
In [108]: X.shape
Out[108]: (4, 3)
In [109]: betas.shape
Out[109]: (2, 3, 2)
# use `np.dot` and roll the second axis to first position
In [110]: dot_prod = np.rollaxis(np.dot(X, betas), 1)
In [111]: dot_prod.shape
Out[111]: (2, 4, 2)
# @Divakar's approach
In [113]: B = np.tensordot(betas, X, axes=(1,1)).swapaxes(1,2)
# sanity check :)
In [115]: np.all(np.equal(dot_prod, B))
Out[115]: True
Now, the performance of both approaches:
# @Divakar's approach
In [117]: %timeit B = np.tensordot(betas, X, axes=(1,1)).swapaxes(1,2)
10.6 µs ± 2.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# @hpaulj's approach
In [151]: %timeit esum_dot = np.einsum('np, dpr -> dnr', X, betas)
4.16 µs ± 235 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# proposed approach: more than 4x faster!!
In [118]: %timeit dot_prod = np.rollaxis(np.dot(X, betas), 1)
2.47 µs ± 11.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [129]: X = np.random.randint(1, 10, (600, 500))
In [130]: betas = np.random.randint(1, 7, (300, 500, 300))
In [131]: %timeit B = np.tensordot(betas, X, axes=(1,1)).swapaxes(1,2)
18.2 s ± 2.41 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [132]: %timeit dot_prod = np.rollaxis(np.dot(X, betas), 1)
52.8 s ± 14.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)