0

I have a tensor of size (1000, 30, 16, 16). I'm doing experiments on how to normalize it. I'm trying to normalize across cases, and may be frequency axis etc.

The following works:

a = np.random.rand(1000, 30, 16, 16)
a - a.mean(axis=(0, )) #==> it works
a - a.mean(axis=(0, 1)) #==> successful broadcast
a - a.mean(axis=(0, 1, 2)) #==> works well
a - a.mean(axis=(0, 1, 2, 3)) #==> succesful broadcast of scalar mean to all a values


#Those however fail:
a - a.mean(axis=(2, 3)) 
#OR:
a - a.mean(axis=(0, 2, 3))

I get:

ValueError: operands could not be broadcast together with shapes (1000, 30, 16, 16) (30,)

It seems that it succesfully completes the missing axes in simple cases like (30, 16, 16)

(16, 16)

(16,)

(1,)

But fails when the missing axes are to the right rather than left, eg: (1000, 30) and it cannot broadcast it to (1000, 30, 16, 16).

To be specific with my question, how can I dictate how broadcasting is being done? For instance, I have (30,) and I want to broadcast it to (1000, 30, 16, 16)

It throws an error as it fails to broadcast. I have a hacky solution, which permuting the axes and making (30,) comes last so that broadcasting works, but I'm wondering if there's a way to dictate how broadcasting should be done. And furthermore, why is't this being done automatically?

Alex Deft
  • 2,531
  • 1
  • 19
  • 34

2 Answers2

2

By default, NumPy broadcasts by adding new axes on the left. For example, if an array has shape (30, 16, 16), then it can automatically broadcast up to shape (1, 30, 16, 16). The new axis of length 1 can further broadcast up to any size necessary to match the array it is being broadcasted to.

This explains why broadcasting works in all these cases:

a = np.random.rand(1000, 30, 16, 16)
a - a.mean(axis=(0, )) #==> it works
a - a.mean(axis=(0, 1)) #==> successful broadcast
a - a.mean(axis=(0, 1, 2)) #==> works well
a - a.mean(axis=(0, 1, 2, 3)) #==> succesful broadcast of scalar mean to all a values

In each case, a.mean(...) removes axes from the left and (possibly) leaves axes on the right. So broadcasting has no problem automatically adding new axes back on the left.

In contrast, a - a.mean(axis=(2, 3)) fails because a.mean(axis=(2,3)) has shape (1000, 30), and can only broadcast up to shapes like (1, 1000, 30) or (1, 1, 1000, 30) and so on. Since a has shape (1000, 30, 16, 16), the lengths of the last two axes from the right are in conflict.

To broadcast successfully in this case, you need to explicitly add new axes on the right using

a - a.mean(axis=(2, 3))[..., np.newaxis, np.newaxis]

or

a - a.mean(axis=(2, 3))[..., None, None]

Now a.mean(axis=(2, 3))[..., None, None] has shape (1000, 30, 1, 1) and can broadcast up to (1000, 30, 16, 16) to become compatible in shape with a.


The docs explain broadcasting by saying

Lining up the sizes of the trailing axes of these arrays according to the broadcast rules, shows that they are compatible:

    Image  (3d array): 256 x 256 x 3
    Scale  (1d array):             3
    Result (3d array): 256 x 256 x 3

Notice that alignment is done by right-justifying the shapes. Empty axes are filled with 1s. Saying that new axes are added on the left is just an alternative way of talking about this same idea.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Very thorough answer. Plus, you corrected the messed up numbers. Many thanks ubunto – Alex Deft Jul 17 '19 at 00:07
  • With this insight, I'm much more confident on what's happening behind the scenes. I always had some misgivings about whether this is being done excatly like I intend. speically when axes are ambiugous like (16, 16, 16, 16) – Alex Deft Jul 17 '19 at 00:13
  • 2
    The key is that it matches based on position, not value. It doesn't look for another `30` to match with. And to avoid ambiguity, it only adds new dimensions at the front. You have to explicitly add any new trailing dimensions. – hpaulj Jul 17 '19 at 00:45
1

Instead of broadcasting implicitly, you can create the additional axes for broadcasting explicitly by slicing with None for each axis you want to add.

To broadcast (30,) to (1000,30,16,16), slice like this:

a[None,:,None,None]

You can see that the second axis is sliced with : which means “all the data” and the remaining axes are None which means “create a new axis here for broadcasting”.

If you think about it, it’s good that implicit broadcasting has strict rules about how it works. Imagine that it could automatically broadcast this way—if this were true, how would it broadcast (30,) to (30,30)? It would be ambiguous. With the current rules it is not ambiguous.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415