loss: nan when fitting training data in tensorflow keras regression network

Question

I am trying to do replicate a regression network from a book but no matter what I try I only get nan losses during the fitting. I have checked and this might happen because of:

bad input data: my data is clean
unscaled input data: I have tried with StandardScaler and MinMaxScaler but no dice
unscaled output data: I have also tried scaling between 0 and 1 using the train set, but new instances will fall outside.
exploding gradients: this might be the case but even with regularization it still happens
learning rate too steep: not even setting it to a low number fixes it
unbounded steps: not even clipping fixes it
error measurement: changing from mse to mean absolute doesn't work either
too big of a batch: reducing the training data to the first 200 entries doesn't work either

What else can be the reason for nans in the loss function?

EDIT: This also happens with all example models around the internet

I am truly out of ideas.

The data looks like this:

X_train[:5]
Out[4]: 
array([[-3.89243447e-01, -6.10268198e-01,  7.23982383e+00,
         7.68512713e+00, -9.15360303e-01, -4.34319791e-02,
         1.69375104e+00, -2.66593858e-01],
       [-1.00512751e+00, -6.10268198e-01,  5.90241386e-02,
         6.22319189e-01, -7.82304360e-01, -6.23993472e-02,
        -8.17899555e-01,  1.52950349e+00],
       [ 5.45617265e-01,  5.78632450e-01, -1.56942033e-01,
        -2.49063893e-01, -5.28447626e-01, -3.67342889e-02,
        -8.31983577e-01,  7.11281365e-01],
       [-1.53276576e-01,  1.84679314e+00, -9.75702024e-02,
         3.03921163e-01, -5.96726334e-01, -6.73883756e-02,
        -7.14616727e-01,  6.56400612e-01],
       [ 1.97163670e+00, -1.56138872e+00,  9.87949430e-01,
        -3.36887553e-01, -3.42869600e-01,  5.08919289e-03,
        -6.86448683e-01,  3.12148621e-01]])

X_valid[:5]
Out[5]: 
array([[ 2.06309546e-01,  1.21271280e+00, -7.86614121e-01,
         1.36422365e-01, -6.81637034e-01, -1.12999850e-01,
        -8.78930317e-01,  7.21259683e-01],
       [ 7.12374210e-01,  1.82332234e-01,  2.24876920e-01,
        -2.22866905e-02,  1.51713346e-01, -2.62325989e-02,
         8.01762978e-01, -1.20954497e+00],
       [ 5.86851369e+00,  2.61592277e-01,  1.86656568e+00,
        -9.86220816e-02,  7.11794858e-02, -1.50302387e-02,
         9.05045806e-01, -1.38915470e+00],
       [-1.81402984e-01, -5.54478959e-02, -6.23050382e-02,
         3.15382948e-02, -2.41326907e-01, -4.58773896e-02,
        -8.74235643e-01,  7.86118754e-01],
       [ 5.02584914e-01, -6.10268198e-01,  8.08807908e-01,
         1.22787966e-01, -3.13107087e-01,  4.73927994e-03,
         1.14447418e+00, -8.00433903e-01]])
y_train[:5]
Out[6]: 
array([[-0.4648844 ],
       [-1.26625476],
       [-0.11064919],
       [ 0.55441007],
       [ 1.19863195]])

y_valid[:5]
Out[7]: 
array([[ 2.018235  ],
       [ 1.25593471],
       [ 2.54525539],
       [ 0.04215816],
       [-0.39716296]])

The code: keras.__version__ 2.4.0

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from tensorflow import keras
import numpy as np

housing = fetch_california_housing()
X_train_full, X_test, y_train_full, y_test = train_test_split(housing.data, housing.target)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full, y_train_full)

scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)
print(f'X_train:{X_train.shape}, X_valid: { X_valid.shape}, y_train: {y_train.shape}, y_valid:{y_valid.shape}')
print(f'X_test: {X_test.shape}, y_test: {y_test.shape}')

assert not np.nan in X_train
assert not np.nan in X_valid

scalery=StandardScaler()
y_train=scalery.fit_transform(y_train.reshape(len(y_train),1))
y_valid=scalery.transform(y_valid.reshape(len(y_valid),1))
y_test=scalery.transform(y_test.reshape(len(y_test),1))

#initializers: relu:he_uniform, tanh:glorot
model = keras.models.Sequential([
                                keras.layers.Dense(30, activation="relu",input_shape=X_train.shape[1:]
                                                    , kernel_initializer="he_uniform"
                                                    , kernel_regularizer='l1')
                                ,keras.layers.Dense(1)
                                ])



optimizer = keras.optimizers.SGD(lr=0.0001, clipvalue=1)
model.compile(loss=keras.losses.MeanSquaredError()
              , optimizer=optimizer)

history = model.fit(X_train[0:200], y_train[0:200]
                    , epochs=5
                    ,validation_data=(X_valid[0:20], y_valid[0:20]))

Output:

X_train:(11610, 8), X_valid: (3870, 8), y_train: (11610,), y_valid:(3870,)
X_test: (5160, 8), y_test: (5160,)
Epoch 1/5
7/7 [==============================] - 0s 24ms/step - loss: nan - val_loss: nan
Epoch 2/5
7/7 [==============================] - 0s 4ms/step - loss: nan - val_loss: nan
Epoch 3/5
7/7 [==============================] - 0s 4ms/step - loss: nan - val_loss: nan
Epoch 4/5
7/7 [==============================] - 0s 5ms/step - loss: nan - val_loss: nan
Epoch 5/5
7/7 [==============================] - 0s 4ms/step - loss: nan - val_loss: nan

Interesting reads (that haven't helped):

score 0 · Accepted Answer · answered Nov 21 '21 at 12:06

0

I have found the answer to my own question:

As it turns out, Tensorflow doesn't work as of now in python 3.10. After downgrading my python version to 3.8 everything started working.

answered Nov 21 '21 at 12:06

Simanamis

1
2

loss: nan when fitting training data in tensorflow keras regression network

1 Answers1