PyTorch broadcasting: how this worked?

Question

I'm new to Deep Learning. I'm studying from Udacity.

I came across one of the codes to build up a neural network, where 2 tensors are being added, specifically the 'bias' tensor with the output of the tensor-multiplication product.

It was kind of...

def activation(x):
return (1/(1+torch.exp(-x)))

inputs = images.view(images.shape[0], -1)
w1 = torch.randn(784, 256)
b1 = torch.randn(256)
h = activation(torch.mm(inputs,w1) + b1)

After flattening the MNIST, it came out as [64,784] (inputs).

I'm not getting how the bias tensor (b1) of dimension [256], could be added to the multiplication product of 'inputs' and 'w1' which comes out to be the dimensions of [256, 64].

tl;dr [broadcasting](https://pytorch.org/docs/stable/notes/broadcasting.html) (support [NumPy broadcasting](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) semantics) — Berriel, Jul 09 '19 at 21:21

score 2 · Accepted Answer · answered Jul 09 '19 at 19:35

2

In simple terms, whenever we use "broadcasting" from a Python library (Numpy or PyTorch), what we are doing is treating our arrays (weight, bias) dimensionally compatible.

In other words, if you are operating with W of shape [256,64], and your bias is only [256]. Then, broadcasting will complete that lacking dimension.

As you can see in the image above, the dimension left is being filled so that our operations can be done successfully. Hope this is helpful

answered Jul 09 '19 at 19:35

Richard Valenz

91
5

But I tried adding the tensors of size [3,3] and [2,3], it didn’t work. It only works for single dimensional i.e either a row tensor or a column tensor? – Aniket Ray Jul 10 '19 at 12:17

score 1 · Answer 2 · answered Jul 09 '19 at 14:57

64 is your batch size, meaning that the bias tensor will be added to each of the 64 examples inside of your batch. Basically it's like if you took 64 tensor of size 256 and added the bias to each of them. Pytorch will naturally broadcast the 256 tensor to a 64*256 size that can be added to the 64*256 output of your precedent layer.

prosti · Answer 3 · 2019-07-09T17:07:32.447

This is something called PyTorch broadcasting.

It is very similar to NumPy broadcasting if you used the library. Here is the example adding a scalar to a 2D tensor m.

m = torch.rand(3,3)
print(m)
s=1
print(m+s)

# tensor([[0.2616, 0.4726, 0.1077],
#         [0.0097, 0.1070, 0.7539],
#         [0.9406, 0.1967, 0.1249]])
# tensor([[1.2616, 1.4726, 1.1077],
#         [1.0097, 1.1070, 1.7539],
#         [1.9406, 1.1967, 1.1249]])

Here is the another example adding 1D tensor and 2D tensor.

v = torch.rand(3)
print(v)
print(m+v)

# tensor([0.2346, 0.9966, 0.0266])
# tensor([[0.4962, 1.4691, 0.1343],
#         [0.2442, 1.1035, 0.7805],
#         [1.1752, 1.1932, 0.1514]])

I rewrote your example:

def activation(x):
    return (1/(1+torch.exp(-x)))

images = torch.randn(3,28,28)
inputs = images.view(images.shape[0], -1)
print("INPUTS:", inputs.shape)

W1 = torch.randn(784, 256)
print("W1:", w1.shape)
B1 = torch.randn(256)
print("B1:", b1.shape)
h = activation(torch.mm(inputs,W1) + B1)

Out

INPUTS: torch.Size([3, 784])
W1: torch.Size([784, 256])
B1: torch.Size([256])

To explain:

INPUTS: of size [3, 784] @ W1: of size [784, 256] will create tensor of size [3, 256]

Then the addition:

After mm: [3, 256] + B1: [256] is done because B1 will take the shape of [3, 256] based on broadcasting.

PyTorch broadcasting: how this worked?

3 Answers3