1

Tried to train UNet on GPU to create binary classified image. Got nan loss on each epoch. Testing of loss function always produces nan-return.

Test case:

import tensorflow as tf
import tensorflow.keras.losses as ls

true = [0.0, 1.0]
pred = [[0.1,0.9],[0.0,1.0]]

tt = tf.convert_to_tensor(true)
tp = tf.convert_to_tensor(pred)

l = ls.SparseCategoricalCrossentropy(from_logits = True)
ret = l(tt,tp)

print(ret) #tf.Tensor(nan, shape=(), dtype=float32)

If i would force my tf to work with CPU (Can Keras with Tensorflow backend be forced to use CPU or GPU at will?), all works fine. And yes, my UNet fits and predicts correctly on CPU.

I checked several posts on keras GitHub, but the all point to problems with compiled ANN, such as using inappropriate optimizers for categorical crossentropy.

Any workaround? Am i missing something?

Alexandr Crit
  • 111
  • 1
  • 14

2 Answers2

2

I had the same issue. My loss was a real number if I trained on CPU. I tried upgrading the TF version, but it didn't fix the problem. I finally fixed my issue by reducing the y dimension. My model output was a 2D array. When I reduced it to 1D, I managed to get a real loss on GPU.

Dani
  • 556
  • 7
  • 23
Anqi Shen
  • 21
  • 2
1

The test code you have provided is working fine on google colab.

tf.__version__

2.3

tf.config.list_physical_devices('GPU')  

Output:

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]  

Your code:

import tensorflow as tf
import tensorflow.keras.losses as ls

true = [0.0, 1.0]
pred = [[0.1,0.9],[0.0,1.0]]

tt = tf.convert_to_tensor(true)
tp = tf.convert_to_tensor(pred)

l = ls.SparseCategoricalCrossentropy(from_logits = True)
ret = l(tt,tp)

print(ret)  

Result:

tf.Tensor(0.8132616, shape=(), dtype=float32)
  • Then, i guess, it's fixed in newer versions of TF. I used (TF 2.1.0) back then. So technically it's not the answer to the question, but it's still a solution. – Alexandr Crit Oct 22 '20 at 16:49