Neural Network Gives a Loss of Nan When Training

Question

I'm training a Keras neural net in Python. When the model is training, the loss is NAN. I can't figure out why this would be. There are no NAN values in the input. Here is the code.

    def train_model(self, epochs, batch_size, verbose=1, layer_sizes=[], activation_function='relu',
                    loss='mean_squared_error', optimizer='adam'):
        layer_sizes = list(layer_sizes)
        model = Sequential()
        model.add(Dense(self.features.shape[1], input_dim=self.features.shape[1], kernel_initializer='normal',
                        activation=activation_function))
        for i in range(len(layer_sizes)):
            model.add(Dense(layer_sizes[i], kernel_initializer='normal', activation=activation_function))
        model.add(Dense(self.targets.shape[1], kernel_initializer='normal', activation=activation_function))
        model.compile(loss=loss, optimizer=optimizer)
        model.fit(self.X_train, self.Y_train, epochs=epochs, verbose=verbose, batch_size=batch_size)
        self.model = model

with the following output

   128/857336 [..............................] - ETA: 58:15 - loss: nan
   384/857336 [..............................] - ETA: 21:36 - loss: nan
   640/857336 [..............................] - ETA: 14:12 - loss: nan
   896/857336 [..............................] - ETA: 11:01 - loss: nan

and it continue on further

Testing for nans is here

print(df.isnull().values.any())

False

Here is the link to a CSV with sample data.

https://drive.google.com/file/d/1FJqcEmTQ24WebelyLRkGOuPFlSUJt92c/view?usp=sharing

and here is the constructor code

        if data_file == '':
            self.engine = create_engine(
                'postgresql://{}:{}@{}:{}/{}'.format(Model.user, Model.password, Model.host, Model.port,
                                                     Model.database))
            data = [chunk for chunk in
                    pd.read_sql('select * from "{}"'.format(Model.table), self.engine, chunksize=200000)]
            df = pd.DataFrame(columns=data[0].columns)
            for datum in data:
                df = pd.concat([df, datum])
            df.to_hdf('Cleaned_Data.h5', key='df', mode='w')
        else:
            df = pd.read_hdf(data_file)
        df = df.fillna(0)
        df = df.head(1000)
        df.to_csv('Minimum_sample.csv')
        print(df.isnull().values.any())
        columns = list(df.columns)
        misc_data, self.targets, self.features = columns[0:5], columns[6:9], columns[5:6]
        misc_data.extend(columns[9:10])
        misc_data.extend(columns[12:13])
        misc_data.extend(columns[15:16])
        self.targets.extend(columns[10:12])
        self.targets.extend(columns[13:15])
        self.targets.extend(columns[16:26])
        self.features.extend(columns[73:470])
        df = df[misc_data + self.targets + self.features]
        self.targets = df[self.targets].values
        self.features = df[self.features].values
        self.X_train, self.X_test, self.Y_train, self.Y_test = train_test_split(self.features, self.targets,
                                                                                test_size=test_split_size)

Any help would be appreciated!

model.train_model(1, 128, layer_sizes=[217]) is the line of code that is called. Do you need some sample data as well? — bballboy8, Jun 26 '20 at 22:52
_Do you need some sample data as well?_ I think that should be covered on the page I linked. — AMC, Jun 26 '20 at 22:54
I added sample data to the question as well as the class constructor. The only difference is the data I'm using is stored in an h5 file while the data supplied is a CSV. Let me know if there is anything else! — bballboy8, Jun 26 '20 at 23:04
https://stackoverflow.com/questions/40050397/deep-learning-nan-loss-reasons — sailfish009, Jun 26 '20 at 23:07
The learning rate is not the issue since the loss never diverges. Furthermore,this is not a classifier. Finally, the question checks for NAN and the only activation function is RELU. — bballboy8, Jun 26 '20 at 23:13
Most likely your network is divergent. Try lowering your learning rate. — Quang Hoang, Jun 26 '20 at 23:26
The default learning rate with Adam is already 0.001 which is quite low. — bballboy8, Jun 26 '20 at 23:36

score 0 · Answer 1 · edited Jun 28 '20 at 19:06

0

You need to standardize your input in some way. Try this:

from sklearn import preprocessing
scalerx = preprocessing.StandardScaler().fit(self.X_train)
self.X_train = scalerx.transform(self.X_train)
self.X_test = scalerx.transform(self.X_test)

edited Jun 28 '20 at 19:06

Reza Keshavarz

688
2
9
17

answered Jun 27 '20 at 00:25

bballboy8

400
6
25

Neural Network Gives a Loss of Nan When Training

1 Answers1