I implemented a simple linear regression in tensorflow. If I keep the number of values very small (less than 8), it works fine. Once I do a sizeable amount (8 or more samples), though, I get NaN's.
This is perplexing, because it's a simple y = w*x +b
regression, with no division. Cost is Mean Squared Error: dividing only by a fixed integer (the number of samples). Also, switching to Sum of Squared Error causes the same problem.
Tensorflow NaN bug? won't help, because there is no division or log in my code.
Finally, adding add_check_numeric_ops
caused the program to run for a very long time, eat up all memory, swap, and finally be terminated.
I tried using float32 instead of float64, but it made no difference.
UPDATE
Decreasing the learning rate of the optimizer allowed me to get some good debug output (and also allowed it to work well for up to 12 data points). For some reason, the optimizer tried w = -2.3e+11
and b = -2.2e+10
(even though the training data is all linear with a little bit of noise). I'm not sure why or how it got to such crazy w
and b
values. Once it's there, it's easy to see how it eventually got to NaN.
But why did the optimizer try such crazy values? And what can I do to prevent it?