6

Using the MNIST tutorial of Tensorflow, I try to make a convolutional network for face recognition with the "Database of Faces".

The images size are 112x92, I use 3 more convolutional layer to reduce it to 6 x 5 as adviced here

I'm very new at convolutional network and most of my layer declaration is made by analogy to the Tensorflow MNIST tutorial, it may be a bit clumsy, so feel free to advice me on this.

x_image = tf.reshape(x, [-1, 112, 92, 1])

h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

W_conv3 = weight_variable([5, 5, 64, 128])
b_conv3 = bias_variable([128])
h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3) + b_conv3)
h_pool3 = max_pool_2x2(h_conv3)

W_conv4 = weight_variable([5, 5, 128, 256])
b_conv4 = bias_variable([256])
h_conv4 = tf.nn.relu(conv2d(h_pool3, W_conv4) + b_conv4)
h_pool4 = max_pool_2x2(h_conv4)

W_conv5 = weight_variable([5, 5, 256, 512])
b_conv5 = bias_variable([512])
h_conv5 = tf.nn.relu(conv2d(h_pool4, W_conv5) + b_conv5)
h_pool5 = max_pool_2x2(h_conv5)

W_fc1 = weight_variable([6 * 5 * 512, 1024])
b_fc1 = bias_variable([1024])
h_pool5_flat = tf.reshape(h_pool5, [-1, 6 * 5 * 512])
h_fc1 = tf.nn.relu(tf.matmul(h_pool5_flat, W_fc1) + b_fc1)

keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

print orlfaces.train.num_classes # 40
W_fc2 = weight_variable([1024, orlfaces.train.num_classes])
b_fc2 = bias_variable([orlfaces.train.num_classes])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

My problem appear when the session run the "correct_prediction" op which is

tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))

At least I think given the error message:

W tensorflow/core/common_runtime/executor.cc:1027] 0x19369d0 Compute status: Invalid argument: Incompatible shapes: [8] vs. [20]
     [[Node: Equal = Equal[T=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](ArgMax, ArgMax_1)]]
Traceback (most recent call last):
  File "./convolutional.py", line 133, in <module>
    train_accuracy = accuracy.eval(feed_dict = {x: batch[0], y_: batch[1], keep_prob: 1.0})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 405, in eval
    return _eval_using_default_session(self, feed_dict, self.graph, session)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2728, in _eval_using_default_session
    return session.run(tensors, feed_dict)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 345, in run
    results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 419, in _do_run
    e.code)
tensorflow.python.framework.errors.InvalidArgumentError: Incompatible shapes: [8] vs. [20]
     [[Node: Equal = Equal[T=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](ArgMax, ArgMax_1)]]
Caused by op u'Equal', defined at:
  File "./convolutional.py", line 125, in <module>
    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 328, in equal
    return _op_def_lib.apply_op("Equal", x=x, y=y, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/op_def_library.py", line 633, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1710, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 988, in __init__
    self._traceback = _extract_stack()

It looks like the y_conv output a matrix of shape 8 x batch_size instead of number_of_class x batch_size

If I change the batch size from 20 to 10, the error message stay the same but instead [8] vs. [20] I get [4] vs. [10]. So from that I conclude that the problem may come from the y_conv declaration (last line of the code above).

The loss function, optimizer, training, etc declarations is the same as in the MNIST tutorial:

cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
sess.run((tf.initialize_all_variables()))
for i in xrange(1000):
    batch = orlfaces.train.next_batch(20)
    if i % 100 == 0:
        train_accuracy = accuracy.eval(feed_dict = {x: batch[0], y_: batch[1], keep_prob: 1.0})
        print "Step %d, training accuracy %g" % (i, train_accuracy)
    train_step.run(feed_dict = {x: batch[0], y_: batch[1], keep_prob: 0.5})

print "Test accuracy %g" % accuracy.eval(feed_dict = {x: orlfaces.test.images, y_: orlfaces.test.labels, keep_prob: 1.0})

Thanks for reading, have a good day

Community
  • 1
  • 1
shorty_ponton
  • 454
  • 4
  • 14
  • 2
    One suggestion: use the `get_shape()` method on the tensors to try and identify the point at which the shape diverged from what you expected. It's difficult to tell what the shapes of the various tensors are without inspecting the full program. – mrry Dec 11 '15 at 14:43
  • @mrry yep that's a good one to start debugging thk, will try and report if it gets me somewhere! – shorty_ponton Dec 11 '15 at 15:02

1 Answers1

6

Well, after a lot debugging, I found that my issue was due to a bad instantiation of the labels. Instead of creating arrays full of zeros and replace one value by one, I created them with random value! Stupid mistake. In case someone wondering what I did wrong there and how I fix it here is the change I made.

Anyway during all the debugging I made, to find this mistake, I found some useful information to debug this kind of problem:

  1. For the cross entropy declaration, the tensorflow's MNIST tutorial use a formula that can lead to NaN value

This formula is

cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))

Instead of this, I found two ways to declare it in a safer fashion:

cross_entropy = -tf.reduce_sum(y_ * tf.log(tf.clip_by_value(y_conv, 1e-10, 1.0)))

or also:

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logit, y_))
  1. As mrry says. printing the shape of the tensors can help to detect shape anomaly.

To get the shape of a tensor just call his get_shape() method like this:

print "W shape:", W.get_shape()
  1. user1111929 in this question use a debug print that help me assert where the problem come from.
Community
  • 1
  • 1
shorty_ponton
  • 454
  • 4
  • 14