I'm doing a binary classification with CNN using Matconvnet on Matconvnet. And now, I'm trying to realize it through Keras on Python. The network is not complex at all and I achieved 96% accuracy on Matconvnet. However with Keras, even I tried my best to ensure every setting is exactly the same with before, I can't get the same result. Or event worse, the model doesn't work at all.
Here are some details about the setting. Any ideas or help is appreciated!
Input
The images are 20*20 size. Training size is 400 and testing size 100, validation size 132.
- Matconvnet: images stored in 20*20*sample_size method
- Keras: images stored in sample_size*20*20*1 method
CNN Structure (3*3)*3 conv- (2*2) maxpooling- fully connected- softmax- logloss
Matconvnet: Use convolutionized layer instead of fully connected one. Here is the code:
function net = initializeCNNA() f=1/100 ; net.layers = {} ; net.layers{end+1} = struct('type', 'conv', ... 'weights', {{f*randn(3,3,1,3, 'single'), zeros(1, 3, 'single')}}, ... 'stride', 1, ... 'pad', 0) ; net.layers{end+1} = struct('type', 'pool', ... 'method', 'max', ... 'pool', [2 2], ... 'stride', 2, ... 'pad', 0) ; net.layers{end+1} = struct('type', 'conv', ... 'weights', {{f*randn(9,9,3,2, 'single'), zeros(1,2,'single')}}, ... 'stride', 1, ... 'pad', 0) ; net.layers{end+1} = struct('type', 'softmaxloss') ; net = vl_simplenn_tidy(net) ;
Keras:
model = Sequential()
model.add(Conv2D(3, (3,3),kernel_initializer=\
keras.initializers.RandomNormal(mean=0.0, stddev=0.1, seed=None), input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2),strides=(2, 2)))
model.add(Flatten())
model.add(Dense(2,activation='softmax',\
kernel_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.1, seed=None)))
- Loss Function
- Matconvnet:
softmaxloss
- Keras:
binary_crossentropy
- Matconvnet:
Optimizer
Matconvnet: SGD
trainOpts.batchSize = 50; trainOpts.numEpochs = 20 ; trainOpts.learningRate = 0.001 ; trainOpts.weightDecay = 0.0005 ; trainOpts.momentum = 0.9 ;
Keras: SGD
sgd = optimizers.SGD(lr=0.001, momentum=0.9, decay=0.0005) model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
- Initialization: filters:N(0,0.1), bias: 0
- normalization: no batch normalization except normalization while input to have 0 mean and 1 std for images.
Above are the aspects I reviewed to make sure I did the correct replication. Yet I don't understand why it doesn't work on Keras. Here are some guesses:
- Matconvnet uses a convolutionized layer instead of fully connected layer and may imply some fancy way to update the parameters.
- They use a different algorithm to apply SGD whose parameters have different meaning.
I also did other tries:
- Change optimizer in Keras into
Adadelta()
. No improvement. Change network structure and make it deeper. It works!
But still want to know why Matconvnet can achieve that good result with a much simpler one.