0

The Github Page of Caffe contains a Windows Branch. I have taken this branch and created a Windows DLL. It is losely based on https://github.com/BVLC/caffe/blob/master/examples/cpp_classification/classification.cpp.

The DLL works and outputs correct classification results. But it is 1.5-5 times slower than the pyCaffe interface. It is very interesting that the pyCaffe Interface takes around 1 second for four images using AlexNet on all computers tested. The DLL time ranges from 1.5 seconds to 2 seconds to 4 seconds.

We have measured the time before and after the loop (using Easily measure elapsed time) of

template <typename Dtype> Dtype Net<Dtype>::ForwardFromTo(int start, int end)

This function resides in https://github.com/BVLC/caffe/blob/master/src/caffe/net.cpp and is called by the CPP and Python Code.

We have compiled Caffe as 32-bit programm without GPU support using Visual Studio 2013

Possible things we have checked so far.

  • Compiler Optimizations
  • The data
  • OS and computer configurations (like CPU/Memory etc.)
  • We have measured multiple times in one execution, such that the benchmark is more stable.
  • We have also profiled the code using CodeXL but I could not find anything unusual, but that of course is a little bit vague.
Community
  • 1
  • 1
Lefix
  • 575
  • 1
  • 5
  • 14

1 Answers1

1

We concluded following: Caffe uses GLog. GLog has Fatal Warning which may look like this

CHECK(a<=b) << "a must be bigger than b";

These warnings let the program crash and are hardly catchable. For that we have created a class to replace GLog. It is fairly simple and uses std::stringstream. Google has done something clever. Whenever the condition is true, the right hand side is not evaluated.

https://github.com/google/glog/blob/de6149ef8e67b064a433a8b88924fa9f606ad5d5/src/windows/glog/logging.h#L569

They solved it using the (void) 0. We missed that part. When I wanted to post the profiling data here, I recognised that some time is lost due to the << operator. We started looking at the profiling data closer and increased the number of function calls, such that every number gets a little bit bigger and clearer. This then has lead us to the solution.

Lefix
  • 575
  • 1
  • 5
  • 14