all
I run into problems when I use batch normalization in Caffe. Here is the code I used in train_val.prototxt.
layer {
name: "conv1"
type: "Convolution"
bottom: "conv0"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 0
decay_mult: 0
}
convolution_param {
num_output: 32
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.0589
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "bnorm1"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
batch_norm_param {
use_global_stats: false
}
}
layer {
name: "scale1"
type: "Scale"
bottom: "conv1"
top: "conv1"
scale_param {
bias_term: true
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "conv16"
type: "Convolution"
bottom: "conv1"
top: "conv16"
param {
lr_mult: 1
decay_mult: 1
}
However, the training is not converging. By removing BN layers (batchnorm + scale), the training can converge. So I started to compare the log files with or without the BN layers. Here are the log files with debug_info = true:
With BN:
I0804 10:22:42.074671 8318 net.cpp:638] [Forward] Layer loadtestdata, top blob data data: 0.368457
I0804 10:22:42.074757 8318 net.cpp:638] [Forward] Layer loadtestdata, top blob label data: 0.514496
I0804 10:22:42.076117 8318 net.cpp:638] [Forward] Layer conv0, top blob conv0 data: 0.115678
I0804 10:22:42.076200 8318 net.cpp:650] [Forward] Layer conv0, param blob 0 data: 0.0455077
I0804 10:22:42.076273 8318 net.cpp:650] [Forward] Layer conv0, param blob 1 data: 0
I0804 10:22:42.076539 8318 net.cpp:638] [Forward] Layer relu0, top blob conv0 data: 0.0446758
I0804 10:22:42.078435 8318 net.cpp:638] [Forward] Layer conv1, top blob conv1 data: 0.0675479
I0804 10:22:42.078516 8318 net.cpp:650] [Forward] Layer conv1, param blob 0 data: 0.0470226
I0804 10:22:42.078589 8318 net.cpp:650] [Forward] Layer conv1, param blob 1 data: 0
I0804 10:22:42.079108 8318 net.cpp:638] [Forward] Layer bnorm1, top blob conv1 data: 0
I0804 10:22:42.079197 8318 net.cpp:650] [Forward] Layer bnorm1, param blob 0 data: 0
I0804 10:22:42.079270 8318 net.cpp:650] [Forward] Layer bnorm1, param blob 1 data: 0
I0804 10:22:42.079350 8318 net.cpp:650] [Forward] Layer bnorm1, param blob 2 data: 0
I0804 10:22:42.079421 8318 net.cpp:650] [Forward] Layer bnorm1, param blob 3 data: 0
I0804 10:22:42.079505 8318 net.cpp:650] [Forward] Layer bnorm1, param blob 4 data: 0
I0804 10:22:42.080267 8318 net.cpp:638] [Forward] Layer scale1, top blob conv1 data: 0
I0804 10:22:42.080345 8318 net.cpp:650] [Forward] Layer scale1, param blob 0 data: 1
I0804 10:22:42.080418 8318 net.cpp:650] [Forward] Layer scale1, param blob 1 data: 0
I0804 10:22:42.080651 8318 net.cpp:638] [Forward] Layer relu1, top blob conv1 data: 0
I0804 10:22:42.082074 8318 net.cpp:638] [Forward] Layer conv16, top blob conv16 data: 0
I0804 10:22:42.082154 8318 net.cpp:650] [Forward] Layer conv16, param blob 0 data: 0.0485365
I0804 10:22:42.082226 8318 net.cpp:650] [Forward] Layer conv16, param blob 1 data: 0
I0804 10:22:42.082675 8318 net.cpp:638] [Forward] Layer loss, top blob loss data: 42.0327
Without BN:
I0803 17:01:29.700850 30274 net.cpp:638] [Forward] Layer loadtestdata, top blob data data: 0.320584
I0803 17:01:29.700920 30274 net.cpp:638] [Forward] Layer loadtestdata, top blob label data: 0.236383
I0803 17:01:29.701556 30274 net.cpp:638] [Forward] Layer conv0, top blob conv0 data: 0.106141
I0803 17:01:29.701633 30274 net.cpp:650] [Forward] Layer conv0, param blob 0 data: 0.0467062
I0803 17:01:29.701692 30274 net.cpp:650] [Forward] Layer conv0, param blob 1 data: 0
I0803 17:01:29.701835 30274 net.cpp:638] [Forward] Layer relu0, top blob conv0 data: 0.0547961
I0803 17:01:29.702193 30274 net.cpp:638] [Forward] Layer conv1, top blob conv1 data: 0.0716117
I0803 17:01:29.702267 30274 net.cpp:650] [Forward] Layer conv1, param blob 0 data: 0.0473551
I0803 17:01:29.702327 30274 net.cpp:650] [Forward] Layer conv1, param blob 1 data: 0
I0803 17:01:29.702425 30274 net.cpp:638] [Forward] Layer relu1, top blob conv1 data: 0.0318472
I0803 17:01:29.702781 30274 net.cpp:638] [Forward] Layer conv16, top blob conv16 data: 0.0403702
I0803 17:01:29.702847 30274 net.cpp:650] [Forward] Layer conv16, param blob 0 data: 0.0474007
I0803 17:01:29.702908 30274 net.cpp:650] [Forward] Layer conv16, param blob 1 data: 0
I0803 17:01:29.703228 30274 net.cpp:638] [Forward] Layer loss, top blob loss data: 11.2245
it is strange that in forward, every layer starting with batchnorm gives 0!!! Also it worth mentioned that Relu (in-place layer) have only 4 lines, but batchnorm and scale (supposed to be also in-place layers) have 6 and 3 lines in log file. Do you know what's the problem.