1

I'm trying to train a convolutional neural network similar to Facenet and Openface. My model is inspired by VGG-16 (posted below).

The problem is that, when I use output = tf.nn.l2_normalize(output, 0) before returning the output, the accuracy drops significantly. I get near 98 percent accuracy without l2_normalize, however, the openface model described here uses it.

These are my results with output = tf.nn.l2_normalize(output, 0):

epoch 0  loss 0.36241 acc 45.59 
nr_test_examples 6400
total_batch_test 100
TEST epoch 0  loss 0.20000 acc 48.45 
epoch 1  loss 0.20000 acc 48.62 
nr_test_examples 6400
total_batch_test 100
TEST epoch 1  loss 0.20000 acc 57.81 
epoch 2  loss 0.20000 acc 49.34 
nr_test_examples 6400
total_batch_test 100
TEST epoch 2  loss 0.20000 acc 43.75 
epoch 3  loss 0.20000 acc 48.97 
nr_test_examples 6400
total_batch_test 100
TEST epoch 3  loss 0.20000 acc 53.12 
epoch 4  loss 0.20000 acc 48.16 
nr_test_examples 6400
total_batch_test 100
TEST epoch 4  loss 0.20000 acc 53.12 
epoch 5  loss 0.20000 acc 49.45 
nr_test_examples 6400
total_batch_test 100
TEST epoch 5  loss 0.20000 acc 56.25 
epoch 6  loss 0.20000 acc 48.75 
nr_test_examples 6400
total_batch_test 100
TEST epoch 6  loss 0.20000 acc 53.12 
epoch 7  loss 0.20000 acc 48.58 
nr_test_examples 6400

EDIT - these are my results without tf.nn.l2_normalize(output,0)

epoch 0  loss 0.20137 acc 56.56 
nr_test_examples 6400
total_batch_test 100
TEST epoch 0  loss 0.15097 acc 73.44 
epoch 1  loss 0.20044 acc 57.64 
nr_test_examples 6400
total_batch_test 100
TEST epoch 1  loss 0.10509 acc 82.81 
epoch 2  loss 0.19985 acc 58.14 
nr_test_examples 6400
total_batch_test 100
TEST epoch 2  loss 0.09480 acc 78.12 
epoch 3  loss 0.19978 acc 58.89 
nr_test_examples 6400
total_batch_test 100
TEST epoch 3  loss 0.07886 acc 82.81 
epoch 4  loss 0.20060 acc 59.12 
nr_test_examples 6400
total_batch_test 100
TEST epoch 4  loss 0.05395 acc 85.94 
epoch 5  loss 0.19938 acc 59.39 
nr_test_examples 6400
total_batch_test 100
TEST epoch 5  loss 0.07320 acc 87.50 
epoch 6  loss 0.20056 acc 59.14 
nr_test_examples 6400
total_batch_test 100

Why does this happen? As loss function I am using tripletloss (I am considering only the triplets with loss greater than zero).

def siamese_convnet(x):

    w_conv1_1 = tf.get_variable(name='w_conv1_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 1, 64])
    w_conv1_2 = tf.get_variable(name='w_conv1_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 64, 64])

    w_conv2_1 = tf.get_variable(name='w_conv2_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 64, 128])
    w_conv2_2 = tf.get_variable(name='w_conv2_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 128, 128])

    w_conv3_1 = tf.get_variable(name='w_conv3_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 128, 256])
    w_conv3_2 = tf.get_variable(name='w_conv3_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 256, 256])
    w_conv3_3 = tf.get_variable(name='w_conv3_3', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 256, 256])

    w_conv4_1 = tf.get_variable(name='w_conv4_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 256, 512])
    w_conv4_2 = tf.get_variable(name='w_conv4_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 512, 512])
    w_conv4_3 = tf.get_variable(name='w_conv4_3', initializer=tf.contrib.layers.xavier_initializer(), shape=[1, 1, 512, 512])

    w_conv5_1 = tf.get_variable(name='w_conv5_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 512, 512])
    w_conv5_2 = tf.get_variable(name='w_conv5_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[3, 3, 512, 512])
    w_conv5_3 = tf.get_variable(name='w_conv5_3', initializer=tf.contrib.layers.xavier_initializer(), shape=[1, 1, 512, 512])

    w_fc_1 = tf.get_variable(name='w_fc_1', initializer=tf.contrib.layers.xavier_initializer(), shape=[5*5*512, 2048])
    w_fc_2 = tf.get_variable(name='w_fc_2', initializer=tf.contrib.layers.xavier_initializer(), shape=[2048, 1024])

    w_out = tf.get_variable(name='w_out', initializer=tf.contrib.layers.xavier_initializer(), shape=[1024, 128])

    bias_conv1_1 = tf.get_variable(name='bias_conv1_1', initializer=tf.constant(0.01, shape=[64]))
    bias_conv1_2 = tf.get_variable(name='bias_conv1_2', initializer=tf.constant(0.01, shape=[64]))

    bias_conv2_1 = tf.get_variable(name='bias_conv2_1', initializer=tf.constant(0.01, shape=[128]))
    bias_conv2_2 = tf.get_variable(name='bias_conv2_2', initializer=tf.constant(0.01, shape=[128]))

    bias_conv3_1 = tf.get_variable(name='bias_conv3_1', initializer=tf.constant(0.01, shape=[256]))
    bias_conv3_2 = tf.get_variable(name='bias_conv3_2', initializer=tf.constant(0.01, shape=[256]))
    bias_conv3_3 = tf.get_variable(name='bias_conv3_3', initializer=tf.constant(0.01, shape=[256]))

    bias_conv4_1 = tf.get_variable(name='bias_conv4_1', initializer=tf.constant(0.01, shape=[512]))
    bias_conv4_2 = tf.get_variable(name='bias_conv4_2', initializer=tf.constant(0.01, shape=[512]))
    bias_conv4_3 = tf.get_variable(name='bias_conv4_3', initializer=tf.constant(0.01, shape=[512]))

    bias_conv5_1 = tf.get_variable(name='bias_conv5_1', initializer=tf.constant(0.01, shape=[512]))
    bias_conv5_2 = tf.get_variable(name='bias_conv5_2', initializer=tf.constant(0.01, shape=[512]))
    bias_conv5_3 = tf.get_variable(name='bias_conv5_3', initializer=tf.constant(0.01, shape=[512]))

    bias_fc_1 = tf.get_variable(name='bias_fc_1', initializer=tf.constant(0.01, shape=[2048]))
    bias_fc_2 = tf.get_variable(name='bias_fc_2', initializer=tf.constant(0.01, shape=[1024]))

    out = tf.get_variable(name='out', initializer=tf.constant(0.01, shape=[128]))

    x = tf.reshape(x , [-1, 160, 160, 1]);

    conv1_1 = tf.nn.relu(conv2d(x, w_conv1_1) + bias_conv1_1);
    conv1_2= tf.nn.relu(conv2d(conv1_1, w_conv1_2) + bias_conv1_2);

    max_pool1 = max_pool(conv1_2);

    conv2_1 = tf.nn.relu( conv2d(max_pool1, w_conv2_1) + bias_conv2_1 );
    conv2_2 = tf.nn.relu( conv2d(conv2_1, w_conv2_2) + bias_conv2_2 );

    max_pool2 = max_pool(conv2_2)

    conv3_1 = tf.nn.relu( conv2d(max_pool2, w_conv3_1) + bias_conv3_1 );
    conv3_2 = tf.nn.relu( conv2d(conv3_1, w_conv3_2) + bias_conv3_2 );
    conv3_3 = tf.nn.relu( conv2d(conv3_2, w_conv3_3) + bias_conv3_3 );

    max_pool3 = max_pool(conv3_3)

    conv4_1 = tf.nn.relu( conv2d(max_pool3, w_conv4_1) + bias_conv4_1 );
    conv4_2 = tf.nn.relu( conv2d(conv4_1, w_conv4_2) + bias_conv4_2 );
    conv4_3 = tf.nn.relu( conv2d(conv4_2, w_conv4_3) + bias_conv4_3 );

    max_pool4 = max_pool(conv4_3)

    conv5_1 = tf.nn.relu( conv2d(max_pool4, w_conv5_1) + bias_conv5_1 );
    conv5_2 = tf.nn.relu( conv2d(conv5_1, w_conv5_2) + bias_conv5_2 );
    conv5_3 = tf.nn.relu( conv2d(conv5_2, w_conv5_3) + bias_conv5_3 );

    max_pool5 = max_pool(conv5_3)

    fc_helper = tf.reshape(max_pool5, [-1, 5*5*512]);
    fc_1 = tf.nn.relu( tf.matmul(fc_helper, w_fc_1) + bias_fc_1 );

    fc_2 = tf.nn.relu( tf.matmul(fc_1, w_fc_2) + bias_fc_2 );

    output = tf.matmul(fc_2, w_out) + out
    output = tf.nn.l2_normalize(output, 0)

    return output

LATER EDIT

I am already normalizing the images before I send to the convolutional neural netowrk, so why would normalizing the output be necessary?

However, I use the output to encode an image containing a face by 128 values. Then I compare the original image with images of other people, and decide who is in the original image by using Euclidean distance between the 128 features of each image. So I was thinking that normalizing the output helps making these comparisons (calculating euclidean distances between the features generated by the network for each image).

So, considering this, should I use tf.nn.l2_normalize?

def get_opencv_image_casia(self, file): #unde file e calea catre poza

        img_helper_1 = cv2.imread(file, cv2.IMREAD_GRAYSCALE)
        img_helper_1 = cv2.resize(img_helper_1, (160, 160))

        img1 = np.reshape(img_helper_1, (25600))

        img1 = np.array(img1, dtype=np.uint8)
        img1 = img1.astype('float32')

        img1_pos = (img1 - img1.mean()) / (img1.std() + 1e-8)

        return (img1_pos, file)
Hello Lili
  • 1,527
  • 1
  • 25
  • 50
  • 1
    In general, adding L2 regularization will have the effect of lowering your **training** accuracy and raising your **generalization** (test set) accuracy. Are you computing both? – David Parks Nov 11 '17 at 20:01
  • Normalization, for example through feature scaling, is usually applied to input data so that your features vary in a similar way. I don't think normalizing your output makes sense. (Regularization is a method of reducing the complexity of your model to prevent overfitting, often by causing some of the weights to become essentially zero. It's typically a term added to the loss function during training only.) The code in your link is in Lua, so it's not clear to me what their call to `nn.Normalize` is doing. When you call `tf.nn.l2_normalize` in your code, what do you want it to do? – Stephen Nov 12 '17 at 03:28
  • @DavidParks the lines starting with TEST show my test set accuracy. It's much lower than the test set accuracy calculated by the network without nn.l2_normalize. Also I added an edit, I actually am normalizing the input images before I send them to the network. So is nn.l2_normalize necessary? – Hello Lili Nov 12 '17 at 13:11
  • @Stephen my thoughts exactly. I am normalizing the input images, so why would normalizing the output be necessary? However, I use the output to encode an image with a face by 128 values. Then I compare the image with other persons, and decide who is the person in the original image by using **euclidean distance between the 128 features**. So I was thinking that normalizing the output helps when using euclidean distances. – Hello Lili Nov 12 '17 at 13:15
  • @DavidParks I added the results on the TEST set without *tf.nn.l2_normalize* and they look much better. – Hello Lili Nov 12 '17 at 13:18
  • 1
    A bit late, but I think the correct code should be `tf.nn.l2_normalize(output, axis=1)` since you want each output to have L2 norm 1 so you need to normalize across the axis 1. If you normalize across the axis 0 (the batch dimension), you will get a different result. – Olivier Moindrot Aug 20 '18 at 08:57

0 Answers0