11

I am running a multi-class model(total 40 class in total) for 2000 epochs. The model is running fine till 828 epoch but at 829 epoch it is giving me an InvalidArgumentError (see the screenshot below)

enter image description here

Below is the code that I used to build my model.

n_cats = 40 
input_bow = tf.keras.Input(shape=(40), name="bow")
hidden_1 = tf.keras.layers.Dense(200, activation="relu")(input_bow)

hidden_2 = tf.keras.layers.Dense(100, activation="relu")(hidden_1)

hidden_3 = tf.keras.layers.Dense(80, activation="relu")(hidden_2)

hidden_4 = tf.keras.layers.Dense(70, activation="relu")(hidden_3)

output = tf.keras.layers.Dense(n_cats, activation="sigmoid")(hidden_4)

model = tf.keras.Model(inputs=[input_bow], outputs=output)

METRICS = [
    tf.keras.metrics.Accuracy(name="Accuracy"),
    tf.keras.metrics.Precision(name="precision"),
    tf.keras.metrics.Recall(name="recall"),
    tf.keras.metrics.AUC(name="auc"),
    tf.keras.metrics.BinaryAccuracy(name="binaryAcc")
]

checkpoint_cb = tf.keras.callbacks.ModelCheckpoint(
    "my_keras_model.h5", save_best_only=True)
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(initial_learning_rate=1e-2,
                                                             decay_steps=10000,
                                                             decay_rate=0.9)


adam_optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule)
model.compile(loss="categorical_crossentropy",
              optimizer="adam", metrics=METRICS)

training_history = model.fit(
    (bow_train),
    indus_cat_train,
    epochs=2000,
    batch_size=128,
    callbacks=[checkpoint_cb],
    validation_data=(bow_test, indus_cat_test))

Please help me to understand this behavior of TensorFlow. What is causing this error? I have read this and this but these do not seem to be a correct explanation in my case.

learner
  • 828
  • 2
  • 19
  • 36

4 Answers4

22

I think that this error is due to the setting of the AUC metric.(see https://www.tensorflow.org/api_docs/python/tf/keras/metrics/AUC) The predictions should be all non-negative values instead of [-nan, -nan, ...] as your model output. You can try something from http://deeplearning.net/software/theano/tutorial/nan_tutorial.html to deal with the NANs. And, if you want to quickly solve this error, you can directly remove the AUC metric from the list.

awilliea
  • 229
  • 1
  • 3
3

I had the exact same problem in my multilabel classification LSTM model. During tuning I found that the larger the learning rate, the more likely this error is to occur. Your specification of initial_learning_rate=1e-2 might already be too high for your problem. For my model, I experienced the following:

lr=0.1 -> error occurs always

lr=0.01-> error occurs very seldomly

lr=0.05-> error occurs never (until now)

These values are based solely on my observations during tuning with early stoppage, so I assume that for full training runs the risk of this error is actually higher. Also, the error seemed to be indepent of the neural net's topology.

The answer above by @awilliea states that the error is related to the AUC metric. I cannot say for sure if that is correct. But at least I can confirm that removing AUC and some other metrics as suggested would have worked for my problem, too. While testing my model with any learning rate and without these metrics, the error never occured. Yet, for most problems you need those metrics, so I suggest to solve the problem via the learning rate.

Viktor
  • 583
  • 1
  • 3
  • 10
0

In your output Dense layer you have to set activation function to "softmax" as this is multi class classification problem.

Also metrics like "binaryAcc" and "AUC" won't work here as they are used specifically with binary classification only.

0

It's solved on tensorflow 2.10 https://github.com/keras-team/keras/issues/15715#issuecomment-1100795008

  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/33933199) – Roshin Raphel Mar 05 '23 at 11:20