0

I'm training a Keras neural net in Python. When the model is training, the loss is NAN. I can't figure out why this would be. There are no NAN values in the input. Here is the code.

    def train_model(self, epochs, batch_size, verbose=1, layer_sizes=[], activation_function='relu',
                    loss='mean_squared_error', optimizer='adam'):
        layer_sizes = list(layer_sizes)
        model = Sequential()
        model.add(Dense(self.features.shape[1], input_dim=self.features.shape[1], kernel_initializer='normal',
                        activation=activation_function))
        for i in range(len(layer_sizes)):
            model.add(Dense(layer_sizes[i], kernel_initializer='normal', activation=activation_function))
        model.add(Dense(self.targets.shape[1], kernel_initializer='normal', activation=activation_function))
        model.compile(loss=loss, optimizer=optimizer)
        model.fit(self.X_train, self.Y_train, epochs=epochs, verbose=verbose, batch_size=batch_size)
        self.model = model

with the following output

   128/857336 [..............................] - ETA: 58:15 - loss: nan
   384/857336 [..............................] - ETA: 21:36 - loss: nan
   640/857336 [..............................] - ETA: 14:12 - loss: nan
   896/857336 [..............................] - ETA: 11:01 - loss: nan

and it continue on further

Testing for nans is here

print(df.isnull().values.any())

False

Here is the link to a CSV with sample data.

https://drive.google.com/file/d/1FJqcEmTQ24WebelyLRkGOuPFlSUJt92c/view?usp=sharing

and here is the constructor code

        if data_file == '':
            self.engine = create_engine(
                'postgresql://{}:{}@{}:{}/{}'.format(Model.user, Model.password, Model.host, Model.port,
                                                     Model.database))
            data = [chunk for chunk in
                    pd.read_sql('select * from "{}"'.format(Model.table), self.engine, chunksize=200000)]
            df = pd.DataFrame(columns=data[0].columns)
            for datum in data:
                df = pd.concat([df, datum])
            df.to_hdf('Cleaned_Data.h5', key='df', mode='w')
        else:
            df = pd.read_hdf(data_file)
        df = df.fillna(0)
        df = df.head(1000)
        df.to_csv('Minimum_sample.csv')
        print(df.isnull().values.any())
        columns = list(df.columns)
        misc_data, self.targets, self.features = columns[0:5], columns[6:9], columns[5:6]
        misc_data.extend(columns[9:10])
        misc_data.extend(columns[12:13])
        misc_data.extend(columns[15:16])
        self.targets.extend(columns[10:12])
        self.targets.extend(columns[13:15])
        self.targets.extend(columns[16:26])
        self.features.extend(columns[73:470])
        df = df[misc_data + self.targets + self.features]
        self.targets = df[self.targets].values
        self.features = df[self.features].values
        self.X_train, self.X_test, self.Y_train, self.Y_test = train_test_split(self.features, self.targets,
                                                                                test_size=test_split_size)

Any help would be appreciated!

bballboy8
  • 400
  • 6
  • 25
  • Please provide a [mcve]. – AMC Jun 26 '20 at 22:50
  • model.train_model(1, 128, layer_sizes=[217]) is the line of code that is called. Do you need some sample data as well? – bballboy8 Jun 26 '20 at 22:52
  • _Do you need some sample data as well?_ I think that should be covered on the page I linked. – AMC Jun 26 '20 at 22:54
  • I added sample data to the question as well as the class constructor. The only difference is the data I'm using is stored in an h5 file while the data supplied is a CSV. Let me know if there is anything else! – bballboy8 Jun 26 '20 at 23:04
  • https://stackoverflow.com/questions/40050397/deep-learning-nan-loss-reasons – sailfish009 Jun 26 '20 at 23:07
  • The learning rate is not the issue since the loss never diverges. Furthermore,this is not a classifier. Finally, the question checks for NAN and the only activation function is RELU. – bballboy8 Jun 26 '20 at 23:13
  • Most likely your network is divergent. Try lowering your learning rate. – Quang Hoang Jun 26 '20 at 23:26
  • The default learning rate with Adam is already 0.001 which is quite low. – bballboy8 Jun 26 '20 at 23:36

1 Answers1

0

You need to standardize your input in some way. Try this:

from sklearn import preprocessing
scalerx = preprocessing.StandardScaler().fit(self.X_train)
self.X_train = scalerx.transform(self.X_train)
self.X_test = scalerx.transform(self.X_test)
Reza Keshavarz
  • 688
  • 2
  • 9
  • 17
bballboy8
  • 400
  • 6
  • 25