Contextualization
I am building a neural network for multi label classification: Identifying labels in an image (what clothes a person is wearing, its color,etc..). I wanted to use pure tensorflow (instead of APIs like kears) in order to have more flexibility over my metrics.
P.S: The data used for this tensorflow model was tested with a Keras built model and didn't produce the issues that I am going to expose here.
Data
My input data are (X,Y): X is of shape (1814,204,204,3) and Y is of shape (1814,39). So basically X are the set of images and Y are the labels associated to each images which will be used for the supervised learning process.
There are 39 labels in total, so for every image of size (1,204,204,3) we associate a vector of shape (1,39) : the 39 values can be 0 or 1. 1,if the corresponding label is identified in that image , O else. Many labels can be identified at the same time, which means that we are not using one hot encoding and it is not a multi-class classification situation!
PS:I already normalized my data in order to be fitted in [0,1]
What I have done
1. First thing I have done is buidling the abstract version of my classifier(which is a CNN):
here is the structure of my CNN:
# Convolutional Layer 1
# Dropout layer 1
# Convolutional Layer 2
# Pooling Layer 2
# Dense layer 3
# Dropout layer 3
# Dense layer 4
for a given dataset of size (?,204,204,3): here is the flow of the data through the different layers:
conv1 OUTPUT shape: (?, 204, 204, 32)
drop1 OUTPUT shape: (?, 204, 204, 32)
conv2 OUTPUT shape: (?, 204, 204, 32)
pool2 OUTPUT shape: (?, 102, 102, 32)
dense3 OUTPUT shape: (?, 512)
drop3 OUTPUT shape: (?, 512)
dense4 OUTPUT shape: (?, 39)
Here is the code for building the structure of the CNN
def create_model(X,Y):
# Convolutional Layer #1
conv1 = tf.layers.conv2d(
inputs=X,
filters=32,
kernel_size=[3, 3],
padding="same",
activation=tf.nn.relu)
print('conv1 OUTPUT shape: ',conv1.shape)
# Dropout layer #1
dropout1 = tf.layers.dropout(
inputs=conv1, rate=0.2, training='TRAIN' == tf.estimator.ModeKeys.TRAIN)
print('drop1 OUTPUT shape: ',dropout1.shape)
# Convolutional Layer #2
conv2 = tf.layers.conv2d(
inputs=dropout1,
filters=32,
kernel_size=[3, 3],
padding="same",
activation=tf.nn.relu)
print('conv2 OUTPUT shape: ',conv2.shape)
# Pooling Layer #2
pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2],strides=2)
print('pool2 OUTPUT shape: ',pool2.shape)
pool2_flat = tf.reshape(pool2, [-1, pool2.shape[1]*pool2.shape[2]*pool2.shape[3]])
# Dense layer #3
dense3 = tf.layers.dense(inputs=pool2_flat, units=512, activation=tf.nn.relu)
print('dense3 OUTPUT shape: ',dense3.shape)
# Dropout layer #3
dropout3 = tf.layers.dropout(
inputs=dense3, rate=0.5, training='TRAIN' == tf.estimator.ModeKeys.TRAIN)
print('drop3 OUTPUT shape: ',dropout3.shape)
# Dense layer #4
Z = tf.layers.dense(inputs=dropout3, units=39, activation=tf.nn.sigmoid)
print('dense4 OUTPUT shape: ',Z.shape)
return Z
2. Now , I am defining my cost function and my optimizer.
- For the cost function I am using cross_entropy_with_logits and calculate independantly the mean for all of my output components over the batch sample. For example, if I Have a batch of size 10, the output of the model is of shape (10,39), so for the cost we will have a vector of shape (1,39) (for each label we calculate the mean over the different exemples in the batch)
- For the optimizer I am using adam optimizer.
Here is the code for calculating the cost and optimizer.
def optimizer_and_cost(output,labels):
# Calculating cost
cost= tf.reduce_mean(labels * - tf.log(output) + (1 - labels) * - tf.log(1 - output),axis=0)
print('cost: shape of cost: ',cost.shape)
cost= tf.reshape(cost, [1, 39])
print('cost reshaped: shape of cost reshaped: ',cost.shape)
#Optimizer
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
return optimizer,cost
PS: The 'axis=0' in tf.reduce_mean is what allows me to calculate for each label independantly , the mean over the batch examples!
3. Defining Placeholders, initializing model and training.
Once my abstract model with different parameters is defined, I created placeholders and built the computational graphs, then I initialized the weights and started the training.
Issues: I started having NaN values for the weights in the different layers and NaNs in the cost function as the optimization goes. So first Reflexe was trying to debug and understand what happens.
I tried to test a simple case which is as follow:
initializing weights---> calculate cost and print it ( print weights too) ---> do one optimization---> calcaluate cost and print it( print weights too) .
Result:
first print is fine I have real values (pretty obvious). However after first optimization: I got NaNs values for the cost. Why does my optimizer make the Cost go NaN after one optimization step !
here is the code for the test! (X_train and Y_train are of shape(1269, 204, 204, 3) and (1269,39) : I am taking only 4 elements of each to test )
#clearing the graph
ops.reset_default_graph()
#defining placeholders
X = tf.placeholder(tf.float32, [None, X_train.shape[1],X_train.shape[2],X_train.shape[3]])
Y = tf.placeholder(tf.float32, [None, Y_train.shape[1]])
optimizer, cost=optimizer_and_cost(create_model(X,Y),Y)
# Initialize all the variables globally
init = tf.global_variables_initializer()
# Start the session to compute the tensorflow graph
sess=tf.Session()
sess.run(init)
#printing cost and first layers weights
print('first layer weights ',sess.run(tf.trainable_variables()[0]) )
print('cost: ',sess.run(cost,feed_dict={X:X_train[0:4,:], Y:Y_train[0:4,:]}))
#doing one optimization step
_ ,OK=sess.run([optimizer, cost], feed_dict={X:X_train[0:4,:], Y:Y_train[0:4,:]})
#printing cost and first layers weights
print('first layer weights ',sess.run(tf.trainable_variables()[0]) )
print('cost :',sess.run(cost,feed_dict={X:X_train[0:4,:], Y:Y_train[0:4,:]}))
#closing session
sess.close()
Any help is welcome.