Follow-up to "In PyTorch how are layer weights and biases initialized by default?"

Question

In the answer with most voted of this question, it says:

Most layers are initialized using Kaiming Uniform method. Example layers include Linear, Conv2d, RNN etc.

I was actually wondering: Where does one know this from? For example, I would like to know the default initialization of torch.nn.Conv2d and torch.nn.BatchNorm2d for PyTorch 1.9.0. For torch.nn.Linear, I found the answer here (from the second answer of the above mentioned question).

score 1 · Accepted Answer · answered Sep 13 '21 at 09:48

Convolutional modules such as nn.Conv1d, nn.Conv2d, and nn.Conv3d inherit from the _ConvNd class. This class has a reset_parameters function implemented just like nn.Linear:

def reset_parameters(self) -> None:
    # Setting a=sqrt(5) in kaiming_uniform is the same as initializing with
    # uniform(-1/sqrt(k), 1/sqrt(k)), where k = weight.size(1) * prod(*kernel_size)
    # For more details see: 
    # https://github.com/pytorch/pytorch/issues/15314#issuecomment-477448573
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
    if self.bias is not None:
        fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
        bound = 1 / math.sqrt(fan_in)
        init.uniform_(self.bias, -bound, bound)

As for nn.BatchNorm2d, it has reset_parameters and reset_running_stats function:

def reset_parameters(self) -> None:
    self.reset_running_stats()
    if self.affine:
        init.ones_(self.weight)
        init.zeros_(self.bias)

def reset_running_stats(self) -> None:
    if self.track_running_stats:
        # running_mean/running_var/num_batches... are registered at runtime depending
        # if self.track_running_stats is on
        self.running_mean.zero_()  # type: ignore[operator]
        self.running_var.fill_(1)  # type: ignore[operator]
        self.num_batches_tracked.zero_()  # type: ignore[operator]

Thanks for this helpful answer! I had actually also been searching for the `reset_parameters` function on GitHub, but I had missed that there is a separate file called `conv.py` on GitHub. I would have one question to the initialization of the bias because it says in the paper "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification" on page 4: "[...] We also initialize **b** = 0." But this is not done for the bias term as the code suggests that you shared. Do you know why not? — Hermi, Sep 13 '21 at 13:04
I think you will find this [other question](https://stackoverflow.com/questions/44883861/initial-bias-values-for-a-neural-network) interesting. It discusses the differences between the two initialization methods. — Ivan, Sep 13 '21 at 13:18

Follow-up to "In PyTorch how are layer weights and biases initialized by default?"

1 Answers1