0

I wanted to use the means, stds from training rather than batch stats since it seems if I use batch statistics my model diverges (as outline here When should one call .eval() and .train() when doing MAML with the PyTorch higher library?). How does one do that?

I am asking since my model seems to have them be zero despite no training having been done yet:

Out[1]: BatchNorm2d(32, eps=0.001, momentum=0.95, affine=True, track_running_stats=True)
args.base_model.model.features.norm1.running_mean
Out[2]: 
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0.])

are these not saved in a ckpt after training? Should they have been saved?

Docs say they should have (https://pytorch.org/docs/stable/_modules/torch/nn/modules/batchnorm.html#BatchNorm2d, https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html):

Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation.

by running means are zero vectors... :/ ?


related:

Charlie Parker
  • 5,884
  • 57
  • 198
  • 323
  • 1
    I just loaded up a model of mine which uses a bn2 layer, and I have non-zero values in my `running_mean` and `running_var` in my saved `model.pth` checkpoint file. How are you saving your model? – jhso Nov 04 '21 at 23:03
  • For clarification: "I am asking since my model seems to have them be zero despite no training having been done yet", are you saying you have just initialised the model? Or have you trained, and then reloaded the model? – jhso Nov 04 '21 at 23:09
  • @jhso these models should be checkpoint models. Not bran new. I have to go right now but will check/answer your questions ASAP. Thanks for the quick response, it's really helpful. – Charlie Parker Nov 04 '21 at 23:10
  • @jhso could this be caused if I am interleaving evaluation and training...? and end my code with an evaluation on the val set? – Charlie Parker Nov 05 '21 at 13:54
  • @jhso perhaps my evaluation code should have a "deep copy" and an explicit delation of this deep copy? but that will likely screw up my GPU memory and result in some type of OMM or something... – Charlie Parker Nov 05 '21 at 16:07
  • So **the main mystery is to figure out how my models were saved and their running averages from training removed** ref: https://discuss.pytorch.org/t/how-does-one-use-the-mean-and-std-from-training-in-batch-norm/136029/5 – Charlie Parker Nov 05 '21 at 19:10
  • Yep, I would assume you do some training while you're in the `model.eval` phase which would never update the batch norm moving mean/std. Therefore, the initialised values (of zeros for mean, ones for std) would never be changed. – jhso Nov 07 '21 at 11:44

0 Answers0