1

I have a question regarding the z-score normalization method.
This method uses the z-score to normalize the values of the dataset and needs a mean/std.
I know that you are normally supposed to use the mean/std of the dataset.
But I have seen multiple tutorials on pytorch.org and the net who just use the 0.5 for mean/std which seems completely arbitrary to me.
And I was wondering why they didn't use the mean/std of the dataset?

Example Tutorials where they just use 0.5 as mean/std: https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html
https://medium.com/ai-society/gans-from-scratch-1-a-deep-introduction-with-code-in-pytorch-and-tensorflow-cb03cdcdba0f
https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py

Lupos
  • 896
  • 12
  • 40
  • Perhaps linking the tutorial as well would help. – GoodDeeds Feb 13 '20 at 20:41
  • 2
    Because sometimes they are not that relevant, then one provide the mean of the range `[0, 1.]`. I would say that it depends on the bias of the dataset. – Berriel Feb 13 '20 at 21:45
  • Does this answer your question? [How do they know mean and std, the input value of transforms.Normalize](https://stackoverflow.com/questions/57532661/how-do-they-know-mean-and-std-the-input-value-of-transforms-normalize) – kHarshit Feb 14 '20 at 07:44
  • I mean I know where they got the mean and std in the case you linked. They calculated the mean and std of the dataset. What I was wondering about are cases of people who just use 0.5 as mean/std for normalization without calculating the mean/std of the dataset which looks completely arbitrary to me. – Lupos Feb 14 '20 at 15:08

1 Answers1

0

If you use the std/mean of your dataset to normalize the same dataset you are going to have after the normalization a mean of 0 and an std of 1.
Where the min/max values of the normalized dataset are not in a certain range.

If you use mean/std of 0.5 as a parameter for normalization of your dataset you are going to have a dataset in the range of -1 to 1.
And the mean of the normalized dataset will be close to zero and the std of the normalized dataset will be close to 0.5.

So to answer my question you use 0.5 as mean/std when you want that your dataset is in a range of -1 to 1.
Which would be beneficial when using, for example, a tanh activation function in a neural network.

Lupos
  • 896
  • 12
  • 40