-2

I'm new to Deep Learning. I'm studying from Udacity.

I came across one of the codes to build up a neural network, where 2 tensors are being added, specifically the 'bias' tensor with the output of the tensor-multiplication product.

It was kind of...

def activation(x):
return (1/(1+torch.exp(-x)))

inputs = images.view(images.shape[0], -1)
w1 = torch.randn(784, 256)
b1 = torch.randn(256)
h = activation(torch.mm(inputs,w1) + b1)

After flattening the MNIST, it came out as [64,784] (inputs).

I'm not getting how the bias tensor (b1) of dimension [256], could be added to the multiplication product of 'inputs' and 'w1' which comes out to be the dimensions of [256, 64].

prosti
  • 42,291
  • 14
  • 186
  • 151
Aniket Ray
  • 48
  • 8
  • tl;dr [broadcasting](https://pytorch.org/docs/stable/notes/broadcasting.html) (support [NumPy broadcasting](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) semantics) – Berriel Jul 09 '19 at 21:21

3 Answers3

2

In simple terms, whenever we use "broadcasting" from a Python library (Numpy or PyTorch), what we are doing is treating our arrays (weight, bias) dimensionally compatible.

In other words, if you are operating with W of shape [256,64], and your bias is only [256]. Then, broadcasting will complete that lacking dimension.

Broadcasting operation

As you can see in the image above, the dimension left is being filled so that our operations can be done successfully. Hope this is helpful

  • But I tried adding the tensors of size [3,3] and [2,3], it didn’t work. It only works for single dimensional i.e either a row tensor or a column tensor? – Aniket Ray Jul 10 '19 at 12:17
1

64 is your batch size, meaning that the bias tensor will be added to each of the 64 examples inside of your batch. Basically it's like if you took 64 tensor of size 256 and added the bias to each of them. Pytorch will naturally broadcast the 256 tensor to a 64*256 size that can be added to the 64*256 output of your precedent layer.

Statistic Dean
  • 4,861
  • 7
  • 22
  • 46
0

This is something called PyTorch broadcasting.

It is very similar to NumPy broadcasting if you used the library. Here is the example adding a scalar to a 2D tensor m.

m = torch.rand(3,3)
print(m)
s=1
print(m+s)

# tensor([[0.2616, 0.4726, 0.1077],
#         [0.0097, 0.1070, 0.7539],
#         [0.9406, 0.1967, 0.1249]])
# tensor([[1.2616, 1.4726, 1.1077],
#         [1.0097, 1.1070, 1.7539],
#         [1.9406, 1.1967, 1.1249]])

Here is the another example adding 1D tensor and 2D tensor.

v = torch.rand(3)
print(v)
print(m+v)

# tensor([0.2346, 0.9966, 0.0266])
# tensor([[0.4962, 1.4691, 0.1343],
#         [0.2442, 1.1035, 0.7805],
#         [1.1752, 1.1932, 0.1514]])

I rewrote your example:

def activation(x):
    return (1/(1+torch.exp(-x)))

images = torch.randn(3,28,28)
inputs = images.view(images.shape[0], -1)
print("INPUTS:", inputs.shape)

W1 = torch.randn(784, 256)
print("W1:", w1.shape)
B1 = torch.randn(256)
print("B1:", b1.shape)
h = activation(torch.mm(inputs,W1) + B1)

Out

INPUTS: torch.Size([3, 784])
W1: torch.Size([784, 256])
B1: torch.Size([256])

To explain:

INPUTS: of size [3, 784] @ W1: of size [784, 256] will create tensor of size [3, 256]

Then the addition:

After mm: [3, 256] + B1: [256] is done because B1 will take the shape of [3, 256] based on broadcasting.

prosti
  • 42,291
  • 14
  • 186
  • 151