0

I am trying to design a convolution neural network for detecting a small red football ball. I have captured aproxx 4000 pictures of a scene in different configurations (adding chairs, bottles,etc…) without the ball inside and 4000 pictures of the scene in also different configurations but with the ball inside somewhere. I am using the resolution 32x32 px. The ball can be seen visually in picture where present. These are some positive example pictures (here are upside down):

I have tried numerous combination of designing the Convolutional NN but I cannot find a decent one. I will present 2 architectures I have tried (a “normal” size one and very small one). I kept designing small and small networks because it thought I would help me with over-fitting problem. So, I have tried: Normal Network Design

Input: 32x32x3
First Conv Layer:

W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 3, 32], stddev=0.1), name=“w1”)
b_conv1 = tf.Variable(tf.constant(0.1, shape=[32]), name=“b1”) _
h_conv1 = tf.nn.relu(tf.nn.conv2d(x, W_conv1, strides=[1, 1, 1, 1], padding=‘SAME’)+ b_conv1, name=“conv1”)
h_pool1 = tf.nn.max_pool(h_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding=‘SAME’, name=“pool1”)

2nd Conv Layer:

W_conv2 = tf.Variable(tf.truncated_normal([5, 5, 32, 16], stddev=0.1), name=“w2”)
b_conv2 = tf.Variable(tf.constant(0.1, shape=[16]), name=“b2”)
h_conv2 = tf.nn.relu(tf.nn.conv2d(h_pool1, W_conv2, strides=[1, 1, 1, 1], padding=‘SAME’)+ b_conv2, name=“conv2”)
h_pool2 = tf.nn.max_pool(h_conv2, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding=‘SAME’, name=“pool2”)

Fully connected layer:

W_fc1 = tf.Variable(tf.truncated_normal([8 * 8* 16, 16], stddev=0.1), name=“w3”)
b_fc1 = tf.Variable(tf.constant(0.1, shape=[16]), name=“b3”)
h_pool2_flat = tf.reshape(h_pool2, [-1, 8816], name=“flat3”)
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1, name=“conv3”)

Dropout

keep_prob = tf.placeholder(tf.float32, name=“keep3”)
h_fc2_drop = tf.nn.dropout(h_fc1, keep_prob, name=“drop3”)

Readout Layer

W_fc3 = tf.Variable(tf.truncated_normal([16, 2], stddev=0.1), name=“w4”)
b_fc3 = tf.Variable(tf.constant(0.1, shape=([2]), name=“b4”) )
y_conv = tf.matmul(h_fc2_drop, W_fc3, name=“yconv”) + b_fc3

Other info

cross_entropy = tf.reduce_mean(
_ tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_conv)+ 0.005 * tf.nn.l2_loss(W_conv1)+ 0.005 * tf.nn.l2_loss(W_fc1) + 0.005 * tf.nn.l2_loss(W_fc3)) _

train_step = tf.train.AdamOptimizer(1e-5,name=“trainingstep”).minimize(cross_entropy)

_#Percentage of correct _
prediction = tf.nn.softmax(y_conv, name=“y_prediction”) _
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y,1), name=“correct_pred”)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name=“acc”)

Parameters

keep_prob: 0.4
batch_size=500
training time in generations=55

Results

Training set final accuracy= 90.2%
Validation set final accuracy= 52.2%

Graph link : Link to accuracy graph

Small Network Design

Input: 32x32x3

First Conv Layer:

W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 3, 16], stddev=0.1), name=“w1”)
_b_conv1 = tf.Variable(tf.constant(0.1, shape=[16]), name=“b1”) _
h_conv1 = tf.nn.relu(tf.nn.conv2d(x, W_conv1, strides=[1, 1, 1, 1], padding=‘SAME’)+ b_conv1, name=“conv1”)
h_pool1 = tf.nn.max_pool(h_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding=‘SAME’, name=“pool1”)

Fully connected layer:

W_fc1 = tf.Variable(tf.truncated_normal([16 * 16* 16, 8], stddev=0.1), name=“w3”)
b_fc1 = tf.Variable(tf.constant(0.1, shape=[8]), name=“b3”)
h_pool2_flat = tf.reshape(h_pool1, [-1, 161616], name=“flat3”)
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1, name=“conv3”)

Dropout

keep_prob = tf.placeholder(tf.float32, name=“keep3”)
h_fc2_drop = tf.nn.dropout(h_fc1, keep_prob, name=“drop3”)

Readout Layer

W_fc3 = tf.Variable(tf.truncated_normal([8, 2], stddev=0.1), name=“w4”)
b_fc3 = tf.Variable(tf.constant(0.1, shape=([2]), name=“b4”) )
y_conv = tf.matmul(h_fc2_drop, W_fc3, name=“yconv”) + b_fc3

Other info

cross_entropy = tf.reduce_mean(
_ tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv)+ 0.005 * tf.nn.l2_loss(W_conv1)+ 0.005 * tf.nn.l2_loss(W_fc1) + 0.005 * tf.nn.l2_loss(W_fc3)) _

train_step = tf.train.AdamOptimizer(1e-5,name=“trainingstep”).minimize(cross_entropy)

_#Percentage of correct _
prediction = tf.nn.softmax(y_conv, name=“y_prediction”) _
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y,1), name=“correct_pred”)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name=“acc”)

Parameters

keep_prob: 0.4
batch_size=500
training time in generations=55

Results

Training set final accuracy= 87%
Validation set final accuracy= 60.6%

Graph Link to accuracy graph

So, everything I do, I cannot get a decent accuracy on validation test. I am sure that is something that is missing but I cannot identify what. I am using dropout and l2 but it seems to overfit anyway

Thanks for reading and amateur or advanced in CNN, please leave a feedback

Vlad
  • 5
  • 3
  • 1
    i think you should use a better data set, deep learning requires HUGE datasets – bakaDev Aug 24 '17 at 15:23
  • btw use https://arxiv.org/abs/1512.03385 – bakaDev Aug 24 '17 at 15:24
  • Thanks for the input @bakaDev . It's a small CNN without so many layers and weights, it's 32x32 and it has only two outputs and seems a simple thing to recognise, a red ball in a environment. Do you think that 8000 pics aren't enough ? – Vlad Aug 24 '17 at 15:34
  • 1-dataset quality is very important 2-if you want to improve your model use could always optimize the hyper parameters:https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization – bakaDev Aug 24 '17 at 15:39
  • how did you split your data to train and valid? Is it random split or there is something qualitatively different about one set and another (like different room, different furniture?) – lejlot Aug 24 '17 at 21:56
  • Also, this looks really odd `h_pool2_flat = tf.reshape(h_pool1, [-1, 161616], name=“flat3”)`, what is "161616"? Just an error in copy pasting, and it was supposed to be 16*16*16 ? – lejlot Aug 24 '17 at 22:02
  • Hi @lejlot and thanks for joining the discussion. I have made 2 different sessions of pictures (training and validation) but there were in the same conditions. Yes, it is 16*16*16 there. Sorry for the typo – Vlad Aug 25 '17 at 12:20

1 Answers1

0

Your results and accuracy curve seem quite normal to me, so the model is learning fine. Few suggestions:

  • As already pointed out in the comments, you probably need a bigger data set. Compare your data set to CIFAR-10, which has 50000 training and 10000 test images, also 32x32. It's just possible that your training data doesn't contain that much of a variation to predict your validation/test images. Consider image augmentation techniques to expand your data set artificially.
  • When you have enough data, use most of it for training. For example, out of 10000 images, I'd split it like this: 7000 for training, 1500 for validation and 1500 for testing. This will make less likely to overfit.
  • If you are sure that your training dataset represents target population well, you might want to play with your regularization hyperparameters: I noticed dropout probability and L2 regularizer. In general, by increasing these parameters you fight overfitting and improve generalization. Early layers usually need a smaller dropout value than later ones. Also consider trying batchnorm, another technique that helps generalization.
  • You might also want to tweak your other hyper-parameters as well (learning rate, filter size, number of filters, batch size, etc) to get a better performance. Here's a good discussion how to do it efficiently.
  • Did you stop training after 10 epochs (this is a limit on your charts)? You probably should give it more time, because for CIFAR-10 it sometimes takes 30-50 epochs to learn well.
Maxim
  • 52,561
  • 27
  • 155
  • 209