The Github Page of Caffe contains a Windows Branch. I have taken this branch and created a Windows DLL. It is losely based on https://github.com/BVLC/caffe/blob/master/examples/cpp_classification/classification.cpp.
The DLL works and outputs correct classification results. But it is 1.5-5 times slower than the pyCaffe interface. It is very interesting that the pyCaffe Interface takes around 1 second for four images using AlexNet on all computers tested. The DLL time ranges from 1.5 seconds to 2 seconds to 4 seconds.
We have measured the time before and after the loop (using Easily measure elapsed time) of
template <typename Dtype> Dtype Net<Dtype>::ForwardFromTo(int start, int end)
This function resides in https://github.com/BVLC/caffe/blob/master/src/caffe/net.cpp and is called by the CPP and Python Code.
We have compiled Caffe as 32-bit programm without GPU support using Visual Studio 2013
Possible things we have checked so far.
- Compiler Optimizations
- The data
- OS and computer configurations (like CPU/Memory etc.)
- We have measured multiple times in one execution, such that the benchmark is more stable.
- We have also profiled the code using CodeXL but I could not find anything unusual, but that of course is a little bit vague.