2

I'm trying to learn (and compare) different deep learning frameworks, by the time they are Caffe and Theano.

http://caffe.berkeleyvision.org/gathered/examples/mnist.html

and

http://deeplearning.net/tutorial/lenet.html

I follow the tutorial to run those frameworks on MNIST dataset. However, I notice a quite difference in term of accuracy and performance.

For Caffe, it's extremely fast for the accuracy to build up to ~97%. In fact, it only takes 5 mins to finish the program (using GPU) which the final accuracy on test set of over 99%. How impressive!

However, on Theano, it is much poorer. It took me more than 46 minutes (using same GPU), just to achieve 92% test performance.

I'm confused as it should not have so much difference between the frameworks running relatively same architectures on same dataset.

So my question is. Is the accuracy number reported by Caffe is the percentage of correct prediction on test set? If so, is there any explanation for the discrepancy?

Thanks.

hosyvietanh
  • 286
  • 2
  • 11
  • In addition to what I wrote below, have you checked that your GPU is actually being used for the Theano code? If you haven't already you could try running this code here to check: http://deeplearning.net/software/theano/tutorial/using_gpu.html I remember it being more tricky with Theano than with Caffe. – Chrigi Feb 16 '16 at 14:16
  • @Chrigi: Yes I check and it run in GPU, hence it surprised me with the difference in performance. – hosyvietanh Feb 16 '16 at 19:10

1 Answers1

6

The examples for Theano and Caffe are not exactly the same network. Two key differences which I can think of are that the Theano example uses sigmoid/tanh activation functions, while the Caffe tutorial uses the ReLU activation function, and that the Theano code uses normal minibatch gradient descent while Caffe uses a momentum optimiser. Both differences will significantly affect the training time of your network. And using the ReLU unit will likely also affect the accuracy.

Note that Caffe is a deep learning framework which already has ready-to-use functions for many commonly used things like the momentum optimiser. Theano, on the other hand, is a symbolic maths library which can be used to build neural networks. However, it is not a deep learning framework.

The Theano tutorial you mentioned is an excellent resource to understand how exactly convolutional and other neural networks work on a basic level. However, it will be cumbersome to implement all the state-of-the-art tweaks. If you want to get state-of-the-art results quickly you are better off using one of the existing deep learning frameworks. Apart from Caffe, there are a number of frameworks based on Theano. I know of keras, blocks, pylearn2, and my personal favourite lasagne.

Chrigi
  • 520
  • 5
  • 11
  • Thanks, but I have one concern about the activation function: So in Caffe example, the neuron in the Convolutional layer is just the the sum of weight * previous layer output (in the kernel window), without applying sigmoid/tanh function on the sum? – hosyvietanh Feb 16 '16 at 19:09
  • @hosyvietanh Caffe is using ReLU activation, so it is doing something like max(0, Wx+b), where x is the input to the layer, where the Theano example is doing sigmoid(Wx+b). – Indie AI Feb 17 '16 at 15:00
  • To be honest, I am a bit puzzled by the absence of the ReLU activations after the conv layers in the Caffe LeNet example. Every convolutional layer should definitely be follow by some activation function. In the original LeNet paper it is the tanh function. In other Caffe examples they also follow up conv layers by activation functions, for example, you can see that they put ReLU's after each conv layer in the [Imagenet example](https://github.com/sguada/caffe-public/blob/master/models/imagenet.prototxt). I would suggest you post this as a new question. – Chrigi Feb 17 '16 at 20:52
  • @Chrigi: Yeah I feel extremely confused as well. I posted another question about it here, in case you're interested: http://stackoverflow.com/questions/35533703/hard-to-understand-caffe-mnist-example – hosyvietanh Feb 21 '16 at 07:59