Train cost is producing Nan Value in Tensorflow Code Example

Question

I'm sure it's a simple question for someone who specializes in TensorFlow, but I couldn't solve it.

I am trying to execute the following code from Github.

https://github.com/drhuangliwei/An-Attention-based-Spatiotemporal-LSTM-Network-for-Next-POI-Recommendation

When I run AT-LSTM.py, line 240 is producing like below

if(global_steps%100==0):
            print("the %i step, train cost is: %f"%(global_steps,cost))
        global_steps+=1

Output

 the 100 step, train cost is: nan
    the 200 step, train cost is: nan
    the 300 step, train cost is: nan
    the 400 step, train cost is: nan
    the 500 step, train cost is: nan
    the 600 step, train cost is: nan
    the 700 step, train cost is: nan
    the 800 step, train cost is: nan
    the 900 step, train cost is: nan
    the 1000 step, train cost is: nan
    the 1100 step, train cost is: nan
    the 1200 step, train cost is: nan
    the 1300 step, train cost is: nan
    the 1400 step, train cost is: nan
    the 1500 step, train cost is: nan
    the 1600 step, train cost is: nan
    the 1700 step, train cost is: nan
    the 1800 step, train cost is: nan
    the 1900 step, train cost is: nan
    the 2000 step, train cost is: nan
    the 2100 step, train cost is: nan
    the 2200 step, train cost is: nan
    the 2300 step, train cost is: nan
    the 2400 step, train cost is: nan
    the 2500 step, train cost is: nan
    the 2600 step, train cost is: nan
    the 2700 step, train cost is: nan
    the 2800 step, train cost is: nan
    the 2900 step, train cost is: nan
    the 3000 step, train cost is: nan
    the 3100 step, train cost is: nan
    the 3200 step, train cost is: nan

Each iteration cost value is getting Nan value. Do you have any idea why I am getting Nan value in every iteration

One possible reason is you might have nan values in your data. Check and replace the nan values with `0` or interpolated values before you use the data for training — sai, Dec 19 '20 at 11:18
Actually, I was thinking to add a small bias (1e-4) to the training code on line 98 but my aim is to use this code in a paper and I have to stick to the original. Since I'm new in TensorFlow, I can't solve the problem, I'm sure the original code is working properly. — drorhun, Dec 19 '20 at 11:28

Guinther Kovalski · Answer 1 · 2020-12-21T20:24:02.020

2

A common cause of this in RNN/LSTM is exploding gradients, you can avoid this with tf.clip (How to apply gradient clipping in TensorFlow?)

You can also get this by using negative labels or by a too large learning rate. Also, check weights initialization.

edited Dec 21 '20 at 20:24

answered Dec 21 '20 at 20:18

Guinther Kovalski

1,629
1
7
15

As you said between the 129 and 132 lines in AT-LSTM.py clipping was applied by the developer. Do you think this code block is enough to alleviate the vanishing or exploding gradient? I changed the learning rate still I am getting a Nan value. Could there be a difference between running this code on high computing computers and running on the local computer? because I am using my computer and most probably developer was using High Performance Computing. – drorhun Dec 21 '20 at 21:47

score 1 · Answer 2 · answered Dec 21 '20 at 20:30

There are a few potential reasons this could be happening. The most common answer here is either

An exploding gradient
A vanishing gradient

Exploding gradients occur when the gradient, well, "explodes" into a very large number. This can be controlling by gradient clipping. A common way to do this is to clip by norm before you apply your gradients. If you control your train_step, you can do it like this:

        with tf.GradientTape() as tape:
            logits = self(x_batch, training=True)
            loss = self.compiled_loss(y_true, logits)

        # backprop
        grads = tape.gradient(loss, self.trainable_weights)
        grads = [
            tf.clip_by_norm(g, self.gradient_clip_norm)  # tunable parameter
            for g in grads
        ]

        self.optimizer.apply_gradients(zip(grads, self.trainable_weights))

The alternative case, a vanishing gradient, can occur in networks where error signal cannot propagate through the entire network. This can happen as a result of a few things:

Your learning rate may be too high
Your network may be very deep

You can use a lower learning rate as an initial solution, but if that is still not working, you could explore residual connections in your network architecture which can help with vanishing gradients.

As you said between the 129 and 132 lines in AT-LSTM.py clipping was applied by the developer. Do you think this code block is enough to alleviate the vanishing gradient? — drorhun, Dec 21 '20 at 21:44
Clipping will not address vanishing gradient, usually only the exploding gradient. For vanishing gradient try lowering the learning rate first. If that does not help, you might look into adding residual blocks into your network architecture — TayTay, Dec 21 '20 at 21:46

Train cost is producing Nan Value in Tensorflow Code Example

2 Answers2