There are multiple ways you can do broadcasting.
Using transpose
The longer way is the way you are trying with transpose
. Here, since array a
has only 2 dimensions (it's last 2 dimensions), you set the first 2 dimensions of the array b
as the last 2 dimensions as well -
a = np.random.random((84,36))
b = np.random.random((84,36,210,45))
c = b.transpose(2,3,0,1) + a #(210, 45, 84, 36) + (84, 36)
c = c.transpose(2,3,0,1) #transpose back to (84,36,210,45)
c.shape
(84, 36, 210, 45)
Just to clarify here, b.transpose(2,3,0,1)
means transpose the 4D array such that the shape now is the 2nd, 3rd, 0th and 1st dimension. Meaning, from (84, 36, 210, 45)
-> (210, 45, 84, 36)
. More clarity here.
Standard broadcasting by adding axes
The standard-way, the more useful one, is to add 2 dimensions to array a
. So now, both the arrays share the first 2 dimensions for broadcasting.
c = a[..., None, None] + b #(84,26,1,1) + (84, 36, 210, 45)
c.shape
(84, 36, 210, 45)
Just to clarify here, a[..., None, None]
adds 2 new axis and turns the 2D tensor of shape (84, 26)
into a 4D tensor of shape (84,26,1,1)
. More clarity here.
Finally, just to prove that both methods are equivalent you can check like this -
np.all((b.transpose(2,3,0,1) + a).transpose(2,3,0,1) == a[...,None, None] + b)
True
Benchmarks on larger arrays -
- Transpose method - 1.88 s ± 977 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
- Standard broadcasting - 1.25 s ± 156 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
However, an interesting thing I noticed - When the broadcasting dimensions are large, then you get a better speedup with the standard method. But when the non-broadcasting dimensions are larger in size for b
, it seems the transpose method is a bit faster than simple broadcasting! I'll analyze this a bit and update my answer but I definitely found something new to learn here :)
Which one is better?
I have faced some situations where due to the nature of the problem, it was necessary to use both methods (e.g in this bounty I needed to use both). I would advise, however, focusing on the standard method as it's far more versatile. I will try to comment on the performance of both in a later edit.