0

I'm trying to modify caffe.proto in order to add 2 new fields to SolverParameter. the two lines I add, at the very end of the SolverParameter message are:

   optional int32 start_lr_policy = 36; // Iteration to start CLR policy described in arXiv:1506.01186v2
   optional float max_lr = 37; //Maximum learning rate for CLR policy

However, when I rerun training nets that worked before, I get the following error:

*** Aborted at 1442262720 (unix time) try "date -d @1442262720" if you are using GNU date ***
PC: @     0x7fad411c0933 caffe::SGDSolver<>::PreSolve()
*** SIGSEGV (@0x10) received by PID 25610 (TID 0x7fad41820a40) from PID 16; stack trace: ***
    @     0x7fad404ffd40 (unknown)
    @     0x7fad411c0933 caffe::SGDSolver<>::PreSolve()
    @           0x40c529 caffe::GetSolver<>()
    @           0x4063b3 train()
    @           0x404951 main
    @     0x7fad404eaec5 (unknown)
    @           0x404efd (unknown)
    @                0x0 (unknown)

The full protobuf message is as follows:

// NOTE
// Update the next available ID when you add a new SolverParameter field.
//
// SolverParameter next available ID: 36 (last added: clip_gradients)
message SolverParameter {
//////////////////////////////////////////////////////////////////////////////
// Specifying the train and test networks
//
// Exactly one train net must be specified using one of the following fields:
//     train_net_param, train_net, net_param, net
// One or more test nets may be specified using any of the following   fields:
//     test_net_param, test_net, net_param, net
// If more than one test net field is specified (e.g., both net and
// test_net are specified), they will be evaluated in the field order given
// above: (1) test_net_param, (2) test_net, (3) net_param/net.
// A test_iter must be specified for each test_net.
// A test_level and/or a test_stage may also be specified for each test_net.
  //////////////////////////////////////////////////////////////////////////////

// Proto filename for the train net, possibly combined with one or more
// test nets.
optional string net = 24;
// Inline train net param, possibly combined with one or more test nets.
optional NetParameter net_param = 25;

optional string train_net = 1; // Proto filename for the train net.
repeated string test_net = 2; // Proto filenames for the test nets.
optional NetParameter train_net_param = 21; // Inline train net params.
repeated NetParameter test_net_param = 22; // Inline test net params.

// The states for the train/test nets. Must be unspecified or
// specified once per net.
//
// By default, all states will have solver = true;
// train_state will have phase = TRAIN,
// and all test_state's will have phase = TEST.
// Other defaults are set according to the NetState defaults.
optional NetState train_state = 26;
repeated NetState test_state = 27;

// The number of iterations for each test net.
repeated int32 test_iter = 3;     

// The number of iterations between two testing phases.
optional int32 test_interval = 4 [default = 0];
optional bool test_compute_loss = 19 [default = false];
// If true, run an initial test pass before the first iteration,
// ensuring memory availability and printing the starting value of the loss.
optional bool test_initialization = 32 [default = true];
optional float base_lr = 5; // The base learning rate
// the number of iterations between displaying info. If display = 0, no info
// will be displayed.
//optional int32 start_lr_policy = 36; // Iteration to start CLR policy described in arXiv:1506.01186v2
//optional float max_lr = 37; //Maximum learning rate for CLR policy
optional int32 display = 6;
// Display the loss averaged over the last average_loss iterations
optional int32 average_loss = 33 [default = 1];
optional int32 max_iter = 7; // the maximum number of iterations
optional string lr_policy = 8; // The learning rate decay policy.
optional float gamma = 9; // The parameter to compute the learning rate.
optional float power = 10; // The parameter to compute the learning rate.
optional float momentum = 11; // The momentum value.
optional float weight_decay = 12; // The weight decay.
// regularization types supported: L1 and L2
// controlled by weight_decay
optional string regularization_type = 29 [default = "L2"];
// the stepsize for learning rate policy "step"
optional int32 stepsize = 13;
// the stepsize for learning rate policy "multistep"
repeated int32 stepvalue = 34;

 // Set clip_gradients to >= 0 to clip parameter gradients to that L2 norm,
// whenever their actual L2 norm is larger.
optional float clip_gradients = 35 [default = -1];

optional int32 snapshot = 14 [default = 0]; // The snapshot interval
optional string snapshot_prefix = 15; // The prefix for the snapshot.
// whether to snapshot diff in the results or not. Snapshotting diff will help
// debugging but the final protocol buffer size will be much larger.
optional bool snapshot_diff = 16 [default = false];
// the mode solver will use: 0 for CPU and 1 for GPU. Use GPU in default.
enum SolverMode {
  CPU = 0;
  GPU = 1;
}
optional SolverMode solver_mode = 17 [default = GPU];
// the device_id will that be used in GPU mode. Use device_id = 0 in default.
optional int32 device_id = 18 [default = 0];
// If non-negative, the seed with which the Solver will initialize the Caffe
// random number generator -- useful for reproducible results. Otherwise,
// (and by default) initialize using a seed derived from the system clock.
optional int64 random_seed = 20 [default = -1];

// Solver type
enum SolverType {
  SGD = 0;
  NESTEROV = 1;
  ADAGRAD = 2;
}
optional SolverType solver_type = 30 [default = SGD];
// numerical stability for AdaGrad
optional float delta = 31 [default = 1e-8];

// If true, print information about the state of the net that may help   with
// debugging learning problems.
optional bool debug_info = 23 [default = false];

// If false, don't save a snapshot after training finishes.
optional bool snapshot_after_train = 28 [default = true];

optional int32 start_lr_policy = 36; // Iteration to start CLR policy   described in arXiv:1506.01186v2
optional float max_lr = 37; //Maximum learning rate for CLR policy

}

Also, surprisingly (to me), if the two new lines:

   optional int32 start_lr_policy = 36; // Iteration to start CLR policy   described in arXiv:1506.01186v2
   optional float max_lr = 37; //Maximum learning rate for CLR policy

Are inserted just after

optional float base_lr = 5; // The base learning rate

Then, the error I get is connected with a misreading/parsing of the solver_type variable (key id = 30).

Does this make sense to anyone? Is there something obviously wrong with the way I'm inserting my 2 new fields? Is there some other code section I need to modify?

Thanks....

Shai
  • 111,146
  • 38
  • 238
  • 371
user1245262
  • 6,968
  • 8
  • 50
  • 77
  • 2
    Is it possible that code compiled against the old version of the protocol is being linked into the same binary (or dynamically loaded into the same process) as code compiled against the new version? There can only be one version of the protobuf loaded into any particular process and *all* code in the process has to be compiled against that same version in order to agree on memory layout. – Kenton Varda Sep 16 '15 at 00:27
  • Have you compiled caffe after making these changes? how do you read these parameters in the c++ code? – Shai Sep 16 '15 at 05:36
  • 1
    Moreover, you might find [this comment](http://stackoverflow.com/questions/30033096/what-is-lr-policy-in-caffe#comment52211580_30045244) relevant. – Shai Sep 16 '15 at 05:38
  • @Kenton Varda - That sounds reasonable, particularly since my errors seem to be affected by where I put the new text in the proto file. But, I do recompile Caffe using 'make all' and then do training from scratch. But, maybe I should look more carefully for something like this – user1245262 Sep 16 '15 at 14:26
  • @Shai - Thanks, that comment is helpful. I'd seen that newer versions of Caffe have more methods for determining step-size, but this also starts letting me know how they compare to CLR. – user1245262 Sep 16 '15 at 14:28

0 Answers0