loss: nan When build a model for bike sharing

Question

I'm new and learning of machine learning, kindly bear with me if the way of asking is not fine and question is so simple.

The issue is my developed model is returning loss as nan, please advice me if anything wrong I made. below are the details.

Program Logic

import tensorflow as tf
import pandas as pd
# Reading the csv file from local drive as a dataframe
bike_df = pd.read_csv('C:\\Users\\HOME\\MLPythonPractice\\Data sets\\Bike-Sharing-Dataset\\day.csv')
bike_result_df = pd.read_csv('C:\\Users\\HOME\\MLPythonPractice\\Data sets\\Bike-Sharing-Dataset\\day.csv')

# Remove unwanted columns from the data frame
bike_df = bike_df.drop(columns=['instant','dteday','cnt'])
# shape of the dataframe
print(bike_df.shape)
# Exact attribute to see the columns of the dataframe
print(bike_df.columns)
# To know the type 
print(type(bike_df))
# To see the information of the dataframe
print(bike_df.info())
# Converting from dataframe to ndarray
bike_s = bike_df.values
print(type(bike_s))
print(bike_s.shape)
# Remove all the columns except cnt column which is result set
bike_result_df['cnt'] = bike_result_df['cnt'].values.astype(np.float64)  #converting to float
bike_result_df = bike_result_df['cnt']  # Removing all columns except cnt column
bike_result_s = bike_result_df.values   # Converting dataframe to ndarray
print(type(bike_result_s))
print(bike_result_s)
import numpy as np
print(type(bike_df))
print(bike_df.shape)
print(bike_result_df.shape)
#As the data frame is available, we will build the graph using keras (## are part of build graph)

## Initialise the sequential model
model = tf.keras.models.Sequential()
## Normalize the input data by creating a normalisation layer
model.add(tf.keras.layers.BatchNormalization(input_shape = (13,)))
## Add desnse layer for predition -- Keras declares weights and bias - dense(1) 1 here is expected value
model.add(tf.keras.layers.Dense(1))
# Compile the model - add loss and gradient descen optimiser
model.compile(optimizer='sgd',loss='mse')
print(type(bike_s))
print(type(bike_result_s))
print(bike_s.shape)
print(bike_result_s.shape)
print(bike_result_s)
# Execute the graph
model.fit(bike_s,bike_result_s,epochs=10)
model.save('models/bike_sharing_lr.h5')

I'm getting the output

Epoch 1/10
731/731 [==============================] - 1s 895us/step - loss: nan     
Epoch 2/10
731/731 [==============================] - 0s 44us/step - loss: nan
Epoch 3/10
731/731 [==============================] - 0s 46us/step - loss: nan
Epoch 4/10
731/731 [==============================] - 0s 44us/step - loss: nan
Epoch 5/10
731/731 [==============================] - 0s 39us/step - loss: nan
Epoch 6/10
731/731 [==============================] - 0s 39us/step - loss: nan
Epoch 7/10
731/731 [==============================] - 0s 47us/step - loss: nan
Epoch 8/10
731/731 [==============================] - 0s 40us/step - loss: nan
Epoch 9/10
731/731 [==============================] - 0s 43us/step - loss: nan
Epoch 10/10
731/731 [==============================] - 0s 42us/step - loss: nan

using the data set from soruce. https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset — Murthy_, Mar 03 '19 at 07:39
Refer to this question. https://stackoverflow.com/questions/40050397/deep-learning-nan-loss-reasons — Jim Todd, Mar 03 '19 at 08:04
just for learning, what the objective of your program about bike? — Frenchy, Mar 03 '19 at 08:55
@Frenchy, The model objective is to predict the number of bikes required based on the features provided. — Murthy_, Mar 06 '19 at 13:59

darksky · Accepted Answer · 2019-03-03T19:03:34.573

To prevent your gradient from exploding you can clip it like so.

model.compile(optimizer=tf.keras.optimizers.SGD(clipnorm=1), loss='mse')

According to https://keras.io/optimizers/, setting clipnorm=1 allows the gradient descent optimizer to control gradient clipping. All parameter gradients will be clipped to a maximum norm of 1. This prevents your loss function to diverge.

See also https://www.dlology.com/blog/how-to-deal-with-vanishingexploding-gradients-in-keras/ for other ways to control exploding gradients.

With above tweak, loss function doesn't diverge, but it doesn't decrease over time either. I've noticed the way you've set up your model is weird. Batch normalization should typically follow an activation layer. I'm not sure why you need to normalize your input, but you should not be using BatchNormalize for that. If you change your model to,

model = tf.keras.models.Sequential()

model.add(tf.keras.layers.Dense(1, activation='relu'))

model.add(tf.keras.layers.BatchNormalization(input_shape = (13,)))

model.compile(optimizer='sgd', loss='mse')

you will get a more meaningful result, and the loss function value now decreases from some 20 million to 120,000.

sorry about my model as in learning phase. Thank you so much for correcting me, the model is now working fine and giving results other than nan but the loss is too high. working on it. — Murthy_, Mar 07 '19 at 01:44

loss: nan When build a model for bike sharing

1 Answers1