Tensorflow: NaN's without any division

Question

I implemented a simple linear regression in tensorflow. If I keep the number of values very small (less than 8), it works fine. Once I do a sizeable amount (8 or more samples), though, I get NaN's.

This is perplexing, because it's a simple y = w*x +b regression, with no division. Cost is Mean Squared Error: dividing only by a fixed integer (the number of samples). Also, switching to Sum of Squared Error causes the same problem.

Tensorflow NaN bug? won't help, because there is no division or log in my code.

Finally, adding add_check_numeric_ops caused the program to run for a very long time, eat up all memory, swap, and finally be terminated.

I tried using float32 instead of float64, but it made no difference.

UPDATE

Decreasing the learning rate of the optimizer allowed me to get some good debug output (and also allowed it to work well for up to 12 data points). For some reason, the optimizer tried w = -2.3e+11 and b = -2.2e+10 (even though the training data is all linear with a little bit of noise). I'm not sure why or how it got to such crazy w and b values. Once it's there, it's easy to see how it eventually got to NaN.

But why did the optimizer try such crazy values? And what can I do to prevent it?

You would probably fix this if you added L2 regularization on weights. You could have explosion in weights if your problem is under-constrained. In other words if more than one combination of w/b give you the same cost. — Yaroslav Bulatov, Oct 09 '16 at 21:00
Fascinating, will try that and hope to follow up. @YaroslavBulatov, can you explain why that is. — SRobertJames, Oct 09 '16 at 23:58
Imagine situation where multiplying all weights by k does not change prediction accuracy (under constrained). If there's then no penalty pushing weights to be small, then gradient descent is free to grow weights without bound (or to shrink them without bound). Often the nature of the model makes numerical inaccuracies more likely to grow the weights instead of shrinking them — Yaroslav Bulatov, Oct 10 '16 at 00:05
Another possible cause is large gradients causing SGD to miscalculate. It happens on the toy input problems from the documentation. The solution is to use a different optimizer. See: http://stackoverflow.com/a/33644778/309334 — Eponymous, Mar 26 '17 at 17:54

score 0 · Answer 1 · edited Mar 27 '17 at 01:49

0

Please take a look at the tutorial of the new TensorFlow Debugger (tfdbg) at: https://www.tensorflow.org/programmers_guide/debugger

It is specifically designed to make debugging this type of NaN/Inf issue easier.

edited Mar 27 '17 at 01:49

Eponymous

6,143
4
43
43

answered Dec 16 '16 at 20:47

Shanqing Cai

3,756
3
23
36

Tensorflow: NaN's without any division

UPDATE

1 Answers1