5

I have two ndarrays with shapes:

A = (32,512,640)

B = (4,512)

I need to multiply A and B such that I get a new ndarray:

C = (4,32,512,640)

Another way to think of it is that each row of vector B is multiplied along axis=-2 of A, which results in a new 1,32,512,640 cube. Each row of B can be looped over forming 1,32,512,640 cubes, which can then be used to build C up by using np.concatenate or np.vstack, such as:

# Sample inputs, where the dimensions aren't necessarily known
a = np.arange(32*512*465, dtype='f4').reshape((32,512,465))
b = np.ones((4,512), dtype='f4')

# Using a loop
d = []
for row in b:
    d.append(np.expand_dims(row[None,:,None]*a, axis=0))

# Or using list comprehension
d = [np.expand_dims(row[None,:,None]*a,axis=0) for row in b]

# Stacking the final list
result = np.vstack(d)

But I am wondering if it's possible to use something like np.einsum or np.tensordot to get this vectorized all in one line. I'm still learning how to use those two methods, so I'm not sure if it's appropriate here.

Thanks!

kmario23
  • 57,311
  • 13
  • 161
  • 150
wolfblade87
  • 173
  • 10
  • 1
    What have you tried already? Is it working? If not, in what way is it failing to perform as expected? – Engineero Jun 08 '18 at 16:43
  • Added what I have tried already, though after the helpful answer posted below. It's performing as expected, but I suspected there may be a better or more efficient way to do this (either via performance or less lines of code). – wolfblade87 Jun 08 '18 at 17:03
  • @wolfblade87 btw, `A`, `B`, and `C` are not vectors; they are 3D, 2D, and 4D arrays resp. – kmario23 Jun 08 '18 at 20:13

1 Answers1

7

We can leverage broadcasting after extending the dimensions of B with None/np.newaxis -

C = A * B[:,None,:,None]

With einsum, it would be -

C = np.einsum('ijk,lj->lijk',A,B)

There's no sum-reduction happening here, so einsum won't be any better than the explicit-broadcasting one. But since, we are looking for Pythonic solution, that could be used, once we get past its string notation.

Let's get some timings to finish things off -

In [15]: m,n,r,p = 32,512,640,4
    ...: A = np.random.rand(m,n,r)
    ...: B = np.random.rand(p,n)

In [16]: %timeit A * B[:,None,:,None]
10 loops, best of 3: 80.9 ms per loop

In [17]: %timeit np.einsum('ijk,lj->lijk',A,B)
10 loops, best of 3: 109 ms per loop

# Original soln
In [18]: %%timeit
    ...: d = []
    ...: for row in B:
    ...:     d.append(np.expand_dims(row[None,:,None]*A, axis=0))
    ...: 
    ...: result = np.vstack(d)
10 loops, best of 3: 130 ms per loop

Leverage multi-core

We could leverage multi-core capability of numexpr, which is suited for arithmetic operations and large data and thus gain some performance boost here. Let's time with it -

In [42]: import numexpr as ne

In [43]: B4D = B[:,None,:,None] # this is virtually free

In [44]: %timeit ne.evaluate('A*B4D')
10 loops, best of 3: 64.6 ms per loop

In one-line as : ne.evaluate('A*B4D',{'A':A,'B4D' :B[:,None,:,None]}).

Related post on how to control multi-core functionality.

Divakar
  • 218,885
  • 19
  • 262
  • 358