Error isn't going down and nan | KerasModel

Question

Goodmoring people. We are trying making a model with hyperparameter tuning with a costum dataset. The problem is that the error is not going down, while it should be doing that. The numbers are also very high. When done with tuning, the output we get is nan. Can someone explain why it is not going down and why we are getting nan as an output. We are using mse.

import tensorflow
import pandas as pd
import tensorflow as tf

# import HPT requirements
from kerastuner import HyperModel
import keras
from tensorflow.keras.layers import (
    Dense,
    Dropout,
)


# load data
ds = pd.read_csv("datasets/ds_extended.csv").dropna()

# preprocessing data
rows = len(ds.index)
jaartallen = ds.pop("jaartallen")
testdata = ds.sample(n = round(rows/10))
traindata = pd.concat([ds,testdata]).drop_duplicates(keep=False)
y_train = traindata.pop("loonbelasting(mln)")
y_test = testdata.pop("loonbelasting(mln)")

x_train = traindata
x_test = testdata

# ANN model
class ANNHyperModel(HyperModel):
    def __init__(self, input_shape, num_classes, layers):
        self.input_shape = input_shape
        self.num_classes = num_classes
        self.layers = layers

    def build(self, hp):
        model = keras.Sequential()

        # add first layer
        model.add(Dense(
                    units=6, 
                    input_shape=self.input_shape,
                    activation='relu'))

        # add middle layers
        for i in range(self.layers):
            model.add(
                Dense(
                    units=hp.Int(
                        'units',
                        min_value=1,
                        max_value=10,
                        step=5,
                        default=30
                    ),
                    activation=hp.Choice(
                        'dense_activation',
                        values=['relu', 'tanh', 'sigmoid'],
                        default='relu'
                    )
                )
            )
            
            model.add(
                Dropout(
                rate=hp.Float(
                    'dropout',
                    min_value=0.0,
                    max_value=0.5,
                    default=0.25,
                    step=0.05
                    )
                )
            )
            
        # add output layer
        model.add(
            Dense(self.num_classes, activation='softmax'))

        # compile model
        model.compile(
            optimizer=keras.optimizers.Adam(
                hp.Float(
                    'learning_rate',
                    min_value=2E-2,
                    max_value=4E-2,
                    sampling='LOG',
                    default=3E-2)),
            loss='mae',
            metrics=['mse']
            )

        return model

NUM_CLASSES = 62000
INPUT_SHAPE = x_train.loc[0].shape
LAYERS = 5

# create hypermodel
hypermodel = ANNHyperModel(num_classes=NUM_CLASSES, input_shape=INPUT_SHAPE, layers=LAYERS)

# perform hyperparameter tuning
from kerastuner.tuners import RandomSearch

MAX_TRIALS = 20
EXECUTION_PER_TRIAL = 10

tuner = RandomSearch(
    hypermodel,
    objective='mse',
    seed=1,
    max_trials=MAX_TRIALS,
    executions_per_trial=EXECUTION_PER_TRIAL,
    directory='random_search',
    project_name='inkomstenbelasting'
)

SEARCH_EPOCHS = 1

tuner.search(x_train, y_train, epochs=SEARCH_EPOCHS, validation_split=0.1)

# get the results
results = tuner.results_summary()
best_model = tuner.get_best_models(num_models=1)[0]
loss, accuracy = best_model.evaluate(x_test, y_test)

"""
while accuracy == 0:
    best_model.fit(x_train, y_train)"""

This is the output we get:

480/481 [============================>.] - ETA: 0s - loss: 1066199616.0000 - mean_absolute_error: 31202.4492WARNING:tensorflow:Can save best model only with mse available, skipping.
481/481 [==============================] - 26s 55ms/step - loss: 1066175872.0000 - mean_absolute_error: 31202.8555 - val_loss: 2047683840.0000 - val_mean_absolute_error: 45249.4258

And when done tuning this is the rest of the output we get:

1/60 [..............................] - ETA: 0s - loss: nan - mse: nan
 5/60 [=>............................] - ETA: 0s - loss: nan - mse: nan
 9/60 [===>..........................] - ETA: 0s - loss: nan - mse: nan
13/60 [=====>........................] - ETA: 0s - loss: nan - mse: nan
17/60 [=======>......................] - ETA: 0s - loss: nan - mse: nan
21/60 [=========>....................] - ETA: 0s - loss: nan - mse: nan
25/60 [===========>..................] - ETA: 0s - loss: nan - mse: nan
29/60 [=============>................] - ETA: 0s - loss: nan - mse: nan
33/60 [===============>..............] - ETA: 0s - loss: nan - mse: nan
37/60 [=================>............] - ETA: 0s - loss: nan - mse: nan
41/60 [===================>..........] - ETA: 0s - loss: nan - mse: nan
45/60 [=====================>........] - ETA: 0s - loss: nan - mse: nan
49/60 [=======================>......] - ETA: 0s - loss: nan - mse: nan
53/60 [=========================>....] - ETA: 0s - loss: nan - mse: nan
57/60 [===========================>..] - ETA: 0s - loss: nan - mse: nan
60/60 [==============================] - 1s 13ms/step - loss: nan - mse: nan

This is the head of our dataset:

Name: loonbelasting(mln), Length: 17100, dtype: float64
<bound method NDFrame.head of        btw_laag  btw_hoog  ...     bevolking  vennootschapsbelasting(mln)
0             6      17.5  ...  1.532312e+07                     9459.000
1             6      17.5  ...  1.532329e+07                     9462.041
2             6      17.5  ...  1.532346e+07                     9465.082
3             6      17.5  ...  1.532363e+07                     9468.123
4             6      17.5  ...  1.532381e+07                     9471.164
...         ...       ...  ...           ...                          ...
18995         6      21.0  ...  1.682904e+07                    14500.680
18996         6      21.0  ...  1.682909e+07                    14502.744
18997         6      21.0  ...  1.682914e+07                    14504.808
18998         6      21.0  ...  1.682919e+07                    14506.872
18999         6      21.0  ...  1.682924e+07                    14508.936

Are you sure all data in your dataset are valid (they all exist and have finite value)? — amin, Dec 21 '20 at 18:33
Try to perform [Gradient Clipping](https://stackoverflow.com/questions/36498127/how-to-apply-gradient-clipping-in-tensorflow) to avoid the exploding of gradients — Merna Mustafa, Dec 21 '20 at 19:14

score 0 · Answer 1 · answered Feb 01 '21 at 06:34

0

Answering here from the comment section for the benefit of the community.

After applying Gradient Clipping to avoid the exploding gradients has resolved the issue of nan.

answered Feb 01 '21 at 06:34

Error isn't going down and nan | KerasModel

1 Answers1