Accuracy does not change

Question

I am trying to train a binary classification model in caffe that tells if an input image is a dog or background. I have 8223 positive samples and 33472 negative samples. My validation set contains 1200 samples, 600 of each class. In fact, my positives are snippets taken from MS-COCO dataset. All images are resized so the biiger dimension does not exceed 92 and the smaller dimension is not smaller than 44. After creating the LMDB files using create_imagenet.sh (resize=false), I started training with the solver and train .prototxt's below. The problem is that I am getting a constant accuracy (0.513333 or 0.486667) which indicates that the network is not learning anything. I hope that someone is able to help Thank you in advanced

solver file:

    iter_size: 32
    test_iter: 600
    test_interval: 20
    base_lr: 0.001
    display: 2
    max_iter: 20000
    lr_policy: "step"
    gamma: 0.99
    stepsize: 700
    momentum: 0.9
    weight_decay: 0.0001
    snapshot: 40
    snapshot_prefix: "/media/DATA/classifiers_data/dog_object/models/"
    solver_mode: GPU
    net: "/media/DATA/classifiers_data/dog_object/net.prototxt"
    solver_type: ADAM

train.prototxt:

    layer {
      name: "train-data"
      type: "Data"
      top: "data"
      top: "label"
      include {
        phase: TRAIN
      }

      data_param {
        source: "/media/DATA/classifiers_data/dog_object/ilsvrc12_train_lmdb"
        batch_size: 1
        backend: LMDB
      }
    }
    layer {
      name: "val-data"
      type: "Data"
      top: "data"
      top: "label"
      include {
        phase: TEST
      }
      data_param {
        source: "/media/DATA/classifiers_data/dog_object/ilsvrc12_val_lmdb"
        batch_size: 1
        backend: LMDB
      }
    }

    layer {
      name: "scale"
      type: "Power"
      bottom: "data"
      top: "scale"
      power_param {
        scale: 0.00390625

      }
    }

    layer {
      bottom: "scale"
      top: "conv1_1"
      name: "conv1_1"
      type: "Convolution"
      convolution_param {
        num_output: 64
        pad: 1
        kernel_size: 3
      }
      param {
        lr_mult: 1
      }
      param {
        lr_mult: 1
      }
    }
    layer {
      bottom: "conv1_1"
      top: "conv1_1"
      name: "relu1_1"
      type: "ReLU"
    }
    layer {
      bottom: "conv1_1"
      top: "conv1_2"
      name: "conv1_2"
      type: "Convolution"
      convolution_param {
        num_output: 64
        pad: 1
        kernel_size: 3
      }
      param {
        lr_mult: 1
      }
      param {
        lr_mult: 1
      }
    }

    layer {
      bottom: "conv1_2"
      top: "conv1_2"
      name: "relu1_2"
      type: "ReLU"
    }
    layer {
      name: "spatial_pyramid_pooling"
      type: "SPP"
      bottom: "conv1_2"
      top: "spatial_pyramid_pooling"
      spp_param {
        pool: MAX
        pyramid_height : 4
      }
    }
    layer {
      bottom: "spatial_pyramid_pooling"
      top: "fc6"
      name: "fc6"
      type: "InnerProduct"
      inner_product_param {
        num_output: 64
      }
      param {
        lr_mult: 1
      }
      param {
        lr_mult: 1
      }
    }
    layer {
      bottom: "fc6"
      top: "fc6"
      name: "relu6"
      type: "ReLU"
    }
    layer {
      bottom: "fc6"
      top: "fc6"
      name: "drop6"
      type: "Dropout"
      dropout_param {
        dropout_ratio: 0.5
      }
    }
    layer {
      bottom: "fc6"
      top: "fc7"
      name: "fc7"
      type: "InnerProduct"
      inner_product_param {
        num_output: 2
      }
      param {
        lr_mult: 1
      }
      param {
        lr_mult: 1
      }
    }
    layer {
      name: "loss"
      type: "SoftmaxWithLoss"
      bottom: "fc7"
      bottom: "label"
      top: "loss"
    }
    layer {
      name: "accuracy/top1"
      type: "Accuracy"
      bottom: "fc7"
      bottom: "label"
      top: "accuracy"
      include: { phase: TEST }

    }

part of training log:

I1125 15:52:36.604038 2326 solver.cpp:362] Iteration 40, Testing net (#0)

I1125 15:52:36.604071 2326 net.cpp:723] Ignoring source layer train-data

I1125 15:52:47.127979 2326 solver.cpp:429] Test net output #0: accuracy = 0.486667

I1125 15:52:47.128067 2326 solver.cpp:429] Test net output #1: loss = 0.694894 (* 1 = 0.694894 loss)

I1125 15:52:48.937928 2326 solver.cpp:242] Iteration 40 (0.141947 iter/s, 14.0897s/2 iter), loss = 0.67717

I1125 15:52:48.938014 2326 solver.cpp:261] Train net output #0: loss = 0.655692 (* 1 = 0.655692 loss)

I1125 15:52:48.938040 2326 sgd_solver.cpp:106] Iteration 40, lr = 0.001

I1125 15:52:52.858757 2326 solver.cpp:242] Iteration 42 (0.510097 iter/s, 3.92083s/2 iter), loss = 0.673962

I1125 15:52:52.858841 2326 solver.cpp:261] Train net output #0: loss = 0.653978 (* 1 = 0.653978 loss)

I1125 15:52:52.858875 2326 sgd_solver.cpp:106] Iteration 42, lr = 0.001

I1125 15:52:56.581573 2326 solver.cpp:242] Iteration 44 (0.53723 iter/s, 3.7228s/2 iter), loss = 0.673144

I1125 15:52:56.581656 2326 solver.cpp:261] Train net output #0: loss = 0.652269 (* 1 = 0.652269 loss)

I1125 15:52:56.581689 2326 sgd_solver.cpp:106] Iteration 44, lr = 0.001

I1125 15:53:00.192082 2326 solver.cpp:242] Iteration 46 (0.553941 iter/s, 3.61049s/2 iter), loss = 0.669606

I1125 15:53:00.192167 2326 solver.cpp:261] Train net output #0: loss = 0.650559 (* 1 = 0.650559 loss)

I1125 15:53:00.192200 2326 sgd_solver.cpp:106] Iteration 46, lr = 0.001

I1125 15:53:04.195417 2326 solver.cpp:242] Iteration 48 (0.499585 iter/s, 4.00332s/2 iter), loss = 0.674327

I1125 15:53:04.195691 2326 solver.cpp:261] Train net output #0: loss = 0.648808 (* 1 = 0.648808 loss)

I1125 15:53:04.195736 2326 sgd_solver.cpp:106] Iteration 48, lr = 0.001

I1125 15:53:07.856842 2326 solver.cpp:242] Iteration 50 (0.546265 iter/s, 3.66123s/2 iter), loss = 0.661835

I1125 15:53:07.856925 2326 solver.cpp:261] Train net output #0: loss = 0.647097 (* 1 = 0.647097 loss)

I1125 15:53:07.856957 2326 sgd_solver.cpp:106] Iteration 50, lr = 0.001

I1125 15:53:11.681635 2326 solver.cpp:242] Iteration 52 (0.522906 iter/s, 3.82478s/2 iter), loss = 0.66071

I1125 15:53:11.681720 2326 solver.cpp:261] Train net output #0: loss = 0.743264 (* 1 = 0.743264 loss)

I1125 15:53:11.681754 2326 sgd_solver.cpp:106] Iteration 52, lr = 0.001

I1125 15:53:15.544859 2326 solver.cpp:242] Iteration 54 (0.517707 iter/s, 3.86319s/2 iter), loss = 0.656414

I1125 15:53:15.544950 2326 solver.cpp:261] Train net output #0: loss = 0.643741 (* 1 = 0.643741 loss)

I1125 15:53:15.544986 2326 sgd_solver.cpp:106] Iteration 54, lr = 0.001

I1125 15:53:19.354320 2326 solver.cpp:242] Iteration 56 (0.525012 iter/s, 3.80943s/2 iter), loss = 0.645277

I1125 15:53:19.354404 2326 solver.cpp:261] Train net output #0: loss = 0.747059 (* 1 = 0.747059 loss)

I1125 15:53:19.354431 2326 sgd_solver.cpp:106] Iteration 56, lr = 0.001

I1125 15:53:23.195466 2326 solver.cpp:242] Iteration 58 (0.520681 iter/s, 3.84112s/2 iter), loss = 0.677604

I1125 15:53:23.195549 2326 solver.cpp:261] Train net output #0: loss = 0.640145 (* 1 = 0.640145 loss)

I1125 15:53:23.195575 2326 sgd_solver.cpp:106] Iteration 58, lr = 0.001

I1125 15:53:25.140920 2326 solver.cpp:362] Iteration 60, Testing net (#0)

I1125 15:53:25.140965 2326 net.cpp:723] Ignoring source layer train-data

I1125 15:53:35.672775 2326 solver.cpp:429] Test net output #0: accuracy = 0.513333

I1125 15:53:35.672937 2326 solver.cpp:429] Test net output #1: loss = 0.69323 (* 1 = 0.69323 loss)

I1125 15:53:37.635395 2326 solver.cpp:242] Iteration 60 (0.138503 iter/s, 14.4401s/2 iter), loss = 0.655983

I1125 15:53:37.635478 2326 solver.cpp:261] Train net output #0: loss = 0.638368 (* 1 = 0.638368 loss)

I1125 15:53:37.635512 2326 sgd_solver.cpp:106] Iteration 60, lr = 0.001

I1125 15:53:41.458472 2326 solver.cpp:242] Iteration 62 (0.523143 iter/s, 3.82305s/2 iter), loss = 0.672996

I1125 15:53:41.458555 2326 solver.cpp:261] Train net output #0: loss = 0.753101 (* 1 = 0.753101 loss)

I1125 15:53:41.458588 2326 sgd_solver.cpp:106] Iteration 62, lr = 0.001

I1125 15:53:45.299643 2326 solver.cpp:242] Iteration 64 (0.520679 iter/s, 3.84114s/2 iter), loss = 0.668675

I1125 15:53:45.299737 2326 solver.cpp:261] Train net output #0: loss = 0.634894 (* 1 = 0.634894 loss)

Shai · Answer 1 · 2017-11-26T06:50:42.780

0

A few comments:
1. Your test set contains 1200 samples, but you are validating on only 600 each time: test_iter*batch_size=600. See this answer for more details.
2. Did you shuffle your training data when you created your lmdb? See this answer for more details.
3. How do you init your weights? there seems to be no call for fillers in your prototxt file. If you do not explicitly define fillers, your weights are init to zero. It's a very difficult starting point for SGD. See this answer for more details.
4. Have you tried setting debug_info: true in your solver and looking into the debug log to trace the root cause of you problem? See this thread for more details.

edited Nov 26 '17 at 06:50

answered Nov 25 '17 at 21:14

Shai

111,146
38
238
371

Thank you very much, I appreciate your help 1. I have used 'test_iter: 600' just in sake of speed. For now I have increased the validation set to 1800 samples and changed 'test_iter: 1800' 2. My training data are well shuffled 3. I have tried 'xavier' and 'gaussian' with different std's but the results did not changed 4. I tried the 'debug_info: true' as mentioned in the answer. I did not find any nan or zero diff's. I am not sure, but L2 values seem to be ok. 5. I tried different solver types and learning rates, but it did not help. I hope I can show my log, but it is too long. – MoMo Nov 26 '17 at 12:23
Could it be that I am using an imbalanced dataset (8223 positives to 33472 negatives)? – MoMo Nov 26 '17 at 12:33
@MoMo I have two more concerns: (1) Imbalanced data as you already noted. (2) `batch_size: 1` seems very small. Have you tried increasing the batch size? – Shai Nov 26 '17 at 13:33
Can you artificially "balance" the data? That is in the text file listing all your examples (the input to `convert_imageset`), can you duplicate the lines of the positive examples such that the number of positives will match the number of negatives? – Shai Nov 26 '17 at 13:34
1

I will balance the dataset and comment what comes out. Regarding the 'batch_size: 1', I have 'iter_size: 32' in the solver file. AFAIK, it accumulates the gradient from 32 samples and update the weights based on the average. – MoMo Nov 26 '17 at 14:09
@MoMo You are correct regarding [`iter_size`](https://stackoverflow.com/q/36526959/1714410) – Shai Nov 26 '17 at 14:17
I conducted multiple experiments. First, I reduced the negatives to 21000 and then to 11000. It didn't help in both cases. Then, I added a third conv layer on top of the first two layers and fine-tuned these two from a pre-trained VGG16 caffemodel. In the case of 21000 it didn't help. But interestingly the accuracy started to go up (0.83 now) with 11000 negatives. Now, I will augment the positives to reach ~30000 and try the balanced data-set with fine-tuning. I will comment the results. – MoMo Nov 26 '17 at 17:39

Accuracy does not change

1 Answers1