Is it better for Neural Network to use smaller range of training data or it does not matter? For example, if I want to train an ANN with angles (values of float) should I pass those values in degrees [0; 360] or in radians [0; 6.28] or maybe all values should be normalized to range [0; 1]? Does the range of training data affects ANN learing quality? My Neural Network has 6 input neurons, 1 hidden layer and I am using sigmoid symmetric activation function (tanh).
2 Answers
For the neural network it doesn't matter whether the data is normalised. However, the performance of the training method can vary a lot. In a nutshell: typically the methods prefer variables which have larger values. This might send the training method off-track.
Crucial for most NN training methods is that all dimensions of the training data have the same domain. If all your variables are angles it doesn't matter, whether they are [0,1) or [0,2*pi) or [0,360) as long as they have the same domain. However, you should avoid having one variable for the angle [0,2*pi) and another variable for the distance in mm where distance can be much larger then 2000000mm.
Two cases where an algorithm might suffer in these cases:
(a) regularisation: if the weights of the NN are force to be small a tiny change of a weight controlling the input of a large domain variable has a much larger impact, than for a small domain
(b) gradient descent: if the step size is limited you have similar effects.
Recommendation: All variables should have the same domain size whether it is [0,1] or [0,2*pi] or ... doesn't matter.
Addition: for many domain "z-score normalisation" works extremely well.

- 3,983
- 1
- 19
- 38
The data points range affects the way you train a model. Suppose the range of values for features in the data set is not normalized. Then, depending on your data, you may end up having elongated Ellipses for the data points in the feature space and the learning model will have a very hard time learning the manifold on which the data points lie on (learn the underlying distribution). Also, in most cases the data points are sparsely spread in the feature space, if not normalized (see this). So, the take-home message is to normalize the features when possible.