caffe: reasons for a caffemodel to fail to load?

Question

My adventures in machine learning started with caffe a month ago. I use the python interface and learn a lot with the included ipython notebook provided in the caffe/examples/ directory.

Having successfully run the notebook 01-learning-lenet.ipynb and having been able to load the caffe model (.caffemodel file) I had created, I ventured into applying the same LeNet neural network to my own data for character recognition.

So I've used the convert_imageset tool, as detailed here to create my own lmdb files for train and test. My images are .png, 60x60 in size, 1 channel. I have followed the exact same approach as in the above mentioned notebook, adapting the appropriate parameters, paths and filenames. I can plot the accuracy and loss (accuracy >90%), and with a small additional script I've written I can output the labels of which images lead to a wrong prediction. So all seems to be in good shape.

But when I want to load my caffe model, it fails and my kernel dies.

Here are so far the reasons I've found that makes the caffemodel fail to load:

wrong path and/or file names the model_def and model_weights (easy to identify from the error message, here the kernel don't die)
deploy.prototxt isn't properly defined.
- It should follow the guidelines found on the BVLC caffe wiki, that is, removing any layer with data and labels from the train.prototxt, and adding a final layer of type "softmax".
- improper setting of the dimensions of the first layer: size of the input image should match that declare in that first layer (input_param { shape: { dim: 1 dim: 1 dim: 60 dim: 60 } })
wrong size of caffemodel (such as few kB), the content can be checked using e.g. this script.
lmdb files incorrect (wrong size, or not created); see the options for create_imageset here
- I've set the resize flag to false (it's a bit counterintuitive that resize size be set to 0 to indicate there's no resizing though)
- I've tried to add the --gray flag to force images to 1 channel and adapt the corresponding dimension in the deploy.prototxt

Besides these reasons, why a caffemodel could fail to load when using net = caffe.Net(model_def,model_weights,caffe.TEST) with pycaffe (here used in a ipython notebook)?

Edit:

After having trained the model, having obtained some encouraging test results with a good accuracy, having obtained snapshots of .caffemodel and .snapshot files, the only message I have when executing the following 2 lines is the one shown below.

model_def = '/home/Prog/caffe/examples/svi/lenet_auto_deploy_2.prototxt'
model_weights = '/home/Prog/caffe/examples/svi/custom_iter_100.caffemodel'

net = caffe.Net(model_def,model_weights,caffe.TEST)

Edit 2:

More information is output in the terminal window from where I've launched the jupyter notebook. Here it is, also after running net = caffe.Net(model_def,model_weights,caffe.TEST) as here above.

W0927 21:01:04.047416  4685 _caffe.cpp:139] DEPRECATION WARNING - deprecated use of Python interface
W0927 21:01:04.047456  4685 _caffe.cpp:140] Use this instead (with the named "weights" parameter):
W0927 21:01:04.047472  4685 _caffe.cpp:142] Net('/home/Prog/caffe/examples/svi/lenet_auto_deploy_2.prototxt', 1, weights='/home/Prog/caffe/examples/svi/custom_iter_100.caffemodel')
I0927 21:01:04.047893  4685 net.cpp:51] Initializing net from parameters: 
state {
  phase: TEST
  level: 0
}
layer {
  name: "data"
  type: "Input"
  top: "data"
  input_param {
    shape {
      dim: 1
      dim: 1
      dim: 60
      dim: 60
    }
  }
}

[... to save space, only the first and last LeNet network's layers are shown ...]

layer {
  name: "prob"
  type: "SoftMax"
  bottom: "score"
  top: "prob"
}
I0927 21:01:04.048302  4685 layer_factory.hpp:77] Creating layer data
I0927 21:01:04.048334  4685 net.cpp:84] Creating Layer data
I0927 21:01:04.048352  4685 net.cpp:380] data -> data
I0927 21:01:04.048384  4685 net.cpp:122] Setting up data
I0927 21:01:04.048406  4685 net.cpp:129] Top shape: 1 1 60 60 (3600)
I0927 21:01:04.048418  4685 net.cpp:137] Memory required for data: 14400
I0927 21:01:04.048429  4685 layer_factory.hpp:77] Creating layer conv1
I0927 21:01:04.048451  4685 net.cpp:84] Creating Layer conv1
I0927 21:01:04.048465  4685 net.cpp:406] conv1 <- data
I0927 21:01:04.048480  4685 net.cpp:380] conv1 -> conv1
I0927 21:01:04.048539  4685 net.cpp:122] Setting up conv1
I0927 21:01:04.048558  4685 net.cpp:129] Top shape: 1 20 56 56 (62720)
I0927 21:01:04.048568  4685 net.cpp:137] Memory required for data: 265280
I0927 21:01:04.048591  4685 layer_factory.hpp:77] Creating layer pool1
I0927 21:01:04.048610  4685 net.cpp:84] Creating Layer pool1
I0927 21:01:04.048622  4685 net.cpp:406] pool1 <- conv1
I0927 21:01:04.048637  4685 net.cpp:380] pool1 -> pool1
I0927 21:01:04.048660  4685 net.cpp:122] Setting up pool1
I0927 21:01:04.048676  4685 net.cpp:129] Top shape: 1 20 28 28 (15680)
I0927 21:01:04.048686  4685 net.cpp:137] Memory required for data: 328000
I0927 21:01:04.048696  4685 layer_factory.hpp:77] Creating layer conv2
I0927 21:01:04.048713  4685 net.cpp:84] Creating Layer conv2
I0927 21:01:04.048724  4685 net.cpp:406] conv2 <- pool1
I0927 21:01:04.048738  4685 net.cpp:380] conv2 -> conv2
I0927 21:01:04.049101  4685 net.cpp:122] Setting up conv2
I0927 21:01:04.049123  4685 net.cpp:129] Top shape: 1 50 24 24 (28800)
I0927 21:01:04.049135  4685 net.cpp:137] Memory required for data: 443200
I0927 21:01:04.049156  4685 layer_factory.hpp:77] Creating layer pool2
I0927 21:01:04.049175  4685 net.cpp:84] Creating Layer pool2
I0927 21:01:04.049187  4685 net.cpp:406] pool2 <- conv2
I0927 21:01:04.049201  4685 net.cpp:380] pool2 -> pool2
I0927 21:01:04.049224  4685 net.cpp:122] Setting up pool2
I0927 21:01:04.049242  4685 net.cpp:129] Top shape: 1 50 12 12 (7200)
I0927 21:01:04.049253  4685 net.cpp:137] Memory required for data: 472000
I0927 21:01:04.049264  4685 layer_factory.hpp:77] Creating layer fc1
I0927 21:01:04.049280  4685 net.cpp:84] Creating Layer fc1
I0927 21:01:04.049293  4685 net.cpp:406] fc1 <- pool2
I0927 21:01:04.049309  4685 net.cpp:380] fc1 -> fc1
I0927 21:01:04.096449  4685 net.cpp:122] Setting up fc1
I0927 21:01:04.096500  4685 net.cpp:129] Top shape: 1 500 (500)
I0927 21:01:04.096515  4685 net.cpp:137] Memory required for data: 474000
I0927 21:01:04.096545  4685 layer_factory.hpp:77] Creating layer relu1
I0927 21:01:04.096570  4685 net.cpp:84] Creating Layer relu1
I0927 21:01:04.096585  4685 net.cpp:406] relu1 <- fc1
I0927 21:01:04.096602  4685 net.cpp:367] relu1 -> fc1 (in-place)
I0927 21:01:04.096624  4685 net.cpp:122] Setting up relu1
I0927 21:01:04.096640  4685 net.cpp:129] Top shape: 1 500 (500)
I0927 21:01:04.096652  4685 net.cpp:137] Memory required for data: 476000
I0927 21:01:04.096664  4685 layer_factory.hpp:77] Creating layer score
I0927 21:01:04.096683  4685 net.cpp:84] Creating Layer score
I0927 21:01:04.096694  4685 net.cpp:406] score <- fc1
I0927 21:01:04.096714  4685 net.cpp:380] score -> score
I0927 21:01:04.096935  4685 net.cpp:122] Setting up score
I0927 21:01:04.096953  4685 net.cpp:129] Top shape: 1 26 (26)
I0927 21:01:04.096967  4685 net.cpp:137] Memory required for data: 476104
I0927 21:01:04.096987  4685 layer_factory.hpp:77] Creating layer prob
F0927 21:01:04.097034  4685 layer_factory.hpp:81] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: SoftMax (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, ContrastiveLoss, Convolution, Crop, Data, Deconvolution, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, Input, LRN, LSTM, LSTMUnit, Log, MVN, MemoryData, MultinomialLogisticLoss, PReLU, Parameter, Pooling, Power, Python, RNN, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, WindowData)
*** Check failure stack trace: ***
[I 21:01:04.851 NotebookApp] KernelRestarter: restarting kernel (1/5)
WARNING:root:kernel d4f64b91-60d4-4fed-bf20-6ccce2018c10 restarted

Usually when the net = caffe.Net(...) line fails it prints a log to explain what went wrong. Could you please add it to your question? — rkellerm, Sep 27 '17 at 06:13
The only error log it prints is related to setting a wrong path or filename, which immediately points to providing the correct path and filename. Otherwise all I have is a frame popping up saying the kernel has died and will restart automatically. — calocedrus, Sep 27 '17 at 06:18
@calocedrus it is important that you add **any** log/output you get before crash. — Shai, Sep 27 '17 at 07:07
@Shai I wished I could! But the only message I have is the uninformative indication that the kernel has died. Nothing else. Or is there a log file ... Wait... Yes... Stupid me, there's some information in the terminal window from where I've launched the jupyter notebook! I add a second edit, and move on to analyze what is written. — calocedrus, Sep 27 '17 at 13:14
The log says `Unknown layer type: SoftMax` , and this is the pointer I believe... It's `Softmax`, not `SoftMax`. I've generated a `deploy.prototxt` using the script here https://stackoverflow.com/a/41027487/6358973 , and it outputs the wrong (?) spelling `SoftMax` in the final layer. — calocedrus, Sep 27 '17 at 14:08
@calocedrus I fixed the `SoftMax` error in the linked answer. — Shai, Sep 27 '17 at 14:12
@Shai :) I'm somehow making some progresses, learning along the way how to decipher some precious information logged in the terminal by the notebook. Now the `Softmax` is fixed, I have new error log, now relating to the network itself: should I further add it in my question? I ask because my question is becoming really long with all these logs and edits... — calocedrus, Sep 27 '17 at 14:18

caffe: reasons for a caffemodel to fail to load?

0 Answers0