Inspired by the question:
Why do different methods for solving Xc=y in python give different solution when they should not?
that seems to have numerical issue due to floating points, inverting matrices and restricting values to [-1,1]
, what I am curious now is why does deep learning not suffer from float or numerical precision errors if most of its training is on data with mean 0 and std 1 (I guess I am assuming that most of the data has been pre-processed to be in that range, plus I feel this has to be roughly right considering the high usage of batch-normalization). Is it because deep learning does not train by raising a polynomial to a very high degree, or why is deep learning usually fine? Is there something special with SGD or maybe the (popular) activation function, relu, elu, etc are not numerically unstable (compared to a high degree polynomial)? Or maybe the GPU training avoids floating point representation all together? Or why is deep learning training numerically stable?