12

I would like to compile / configure Caffe so that when I trained an artificial neural network with it, the training is multi-threaded (CPU only, no GPU). How to enable multithreading with Caffe? I use Caffe on Ubuntu 14.04 LTS x64.

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501

3 Answers3

22

One way is to use OpenBLAS instead of the default ATLAS. To do so,

  1. sudo apt-get install -y libopenblas-dev
  2. Before compiling Caffe, edit Makefile.config, replace BLAS := atlas by BLAS := open
  3. After compiling Caffe, running export OPENBLAS_NUM_THREADS=4 will cause Caffe to use 4 cores.

If interested, here is a script to install Caffe and pycaffe on a new Ubuntu 14.04 LTS x64 or Ubuntu 14.10 x64. CPU only, multi-threaded Caffe. It can probably be improved, but it's good enough for me for now:

# This script installs Caffe and pycaffe on Ubuntu 14.04 x64 or 14.10 x64. CPU only, multi-threaded Caffe.
# Usage: 
# 0. Set up here how many cores you want to use during the installation:
# By default Caffe will use all these cores.
NUMBER_OF_CORES=4
# 1. Execute this script, e.g. "bash compile_caffe_ubuntu_14.04.sh" (~30 to 60 minutes on a new Ubuntu).
# 2. Open a new shell (or run "source ~/.bash_profile"). You're done. You can try 
#    running "import caffe" from the Python interpreter to test.

#http://caffe.berkeleyvision.org/install_apt.html : (general install info: http://caffe.berkeleyvision.org/installation.html)
cd
sudo apt-get update
#sudo apt-get upgrade -y # If you are OK getting prompted
sudo DEBIAN_FRONTEND=noninteractive apt-get upgrade -y -q -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" # If you are OK with all defaults

sudo apt-get install -y libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev
sudo apt-get install -y --no-install-recommends libboost-all-dev
sudo apt-get install -y libatlas-base-dev 
sudo apt-get install -y python-dev 
sudo apt-get install -y python-pip git

# For Ubuntu 14.04
sudo apt-get install -y libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler 

# LMDB
# https://github.com/BVLC/caffe/issues/2729: Temporarily broken link to the LMDB repository #2729
#git clone https://gitorious.org/mdb/mdb.git
#cd mdb/libraries/liblmdb
#make && make install 

git clone https://github.com/LMDB/lmdb.git 
cd lmdb/libraries/liblmdb
sudo make 
sudo make install

# More pre-requisites 
sudo apt-get install -y cmake unzip doxygen
sudo apt-get install -y protobuf-compiler
sudo apt-get install -y libffi-dev python-dev build-essential
sudo pip install lmdb
sudo pip install numpy
sudo apt-get install -y python-numpy
sudo apt-get install -y gfortran # required by scipy
sudo pip install scipy # required by scikit-image
sudo apt-get install -y python-scipy # in case pip failed
sudo apt-get install -y python-nose
sudo pip install scikit-image # to fix https://github.com/BVLC/caffe/issues/50


# Get caffe (http://caffe.berkeleyvision.org/installation.html#compilation)
cd
mkdir caffe
cd caffe
wget https://github.com/BVLC/caffe/archive/master.zip
unzip -o master.zip
cd caffe-master

# Prepare Python binding (pycaffe)
cd python
for req in $(cat requirements.txt); do sudo pip install $req; done
echo "export PYTHONPATH=$(pwd):$PYTHONPATH " >> ~/.bash_profile # to be able to call "import caffe" from Python after reboot
source ~/.bash_profile # Update shell 
cd ..

# Compile caffe and pycaffe
cp Makefile.config.example Makefile.config
sed -i '8s/.*/CPU_ONLY := 1/' Makefile.config # Line 8: CPU only
sudo apt-get install -y libopenblas-dev
sed -i '33s/.*/BLAS := open/' Makefile.config # Line 33: to use OpenBLAS
# Note that if one day the Makefile.config changes and these line numbers change, we're screwed
# Maybe it would be best to simply append those changes at the end of Makefile.config 
echo "export OPENBLAS_NUM_THREADS=($NUMBER_OF_CORES)" >> ~/.bash_profile 
mkdir build
cd build
cmake ..
cd ..
make all -j$NUMBER_OF_CORES # 4 is the number of parallel threads for compilation: typically equal to number of physical cores
make pycaffe -j$NUMBER_OF_CORES
make test
make runtest
#make matcaffe
make distribute

# Bonus for other work with pycaffe
sudo pip install pydot
sudo apt-get install -y graphviz
sudo pip install scikit-learn

# At the end, you need to run "source ~/.bash_profile" manually or start a new shell to be able to do 'python import caffe', 
# because one cannot source in a bash script. (http://stackoverflow.com/questions/16011245/source-files-in-a-bash-script)

I have placed this script on GitHub:
https://github.com/Franck-Dernoncourt/caffe_demos/tree/master/caffe_installation .

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
  • How many cores will be used when libopenblas-dev installed by default? – mrgloom Sep 02 '15 at 10:45
  • @mrgloom, on my first test it only used one – PlagTag Oct 27 '15 at 15:44
  • Script works fine;however, I'm using the master branch of Caffe and the `sed -i '33s/.*/BLAS := open/' Makefile.config`, must be `sed -i '46s/.*/BLAS := open/' Makefile.config`. – cagatayodabasi Mar 24 '16 at 16:09
  • @cagatayodabasi thanks I guess they changed the Makefile.config. I am using Theano nowadays, so I haven't tried recently. – Franck Dernoncourt Mar 24 '16 at 16:13
  • Not worked for me. Are you sure that using `sudo apt-get install -y libopenblas-dev` is sufficient (i.e. no recompiling is needed?) Why you also install atlas `sudo apt-get install -y libatlas-base-dev `? What network topology you used when tested it? Do you see all CPU utilisation? – mrgloom May 19 '16 at 15:55
  • Seems `openblas` works but not for all netwok architectures? http://stackoverflow.com/questions/37327064/caffe-multi-cpu-build – mrgloom May 20 '16 at 15:34
  • @mrgloom please take a look at intel branch after you pull the git. It has the code to optimize intel-based servers as well as enabling multi-threading using mkl. I'm able to train caffenet example and see 2500% CPU usage on my Broadwell box. – webbertiger Jul 02 '17 at 20:32
  • By the way Tensorflow support multi-cpu training and inference out-of-the box. – mrgloom Jul 02 '17 at 20:35
  • @mrgloom yes I now use Tensorflow :) – Franck Dernoncourt Jul 02 '17 at 20:38
  • Seems like `export OPENBLAS_NUM_THREADS=4` helps but scaling is 'weak': for my task inference (`export OPENBLAS_NUM_THREADS=8 600%, export OPENBLAS_NUM_THREADS=4 350%, export OPENBLAS_NUM_THREADS=2 200%`) – mrgloom Jul 21 '17 at 13:05
  • I tried this recently, but it gives segmentations faults unexpectedly during inference. I ran the same inference in a loop 10 times and it never goes beyond 2-3 inferences. When I used ```OPENBLAS_NUM_THREADS=1``` (or 2) it works fine but it starts breaking 3 onwards. – Anupam Sobti Feb 15 '19 at 09:14
1

This is to just extend Franck's answer where he used sed to modify the config file. If you are having problems with that, here is another way to get the same thing done.

The difference is that instead of changing the config file you directly change the camke flag cmake -DCPU_ONLY=1 -DBLAS=open ..

$sudo apt update && sudo apt-get install -y libopenblas-dev
$git clone -b 1.0 --depth 1 https://github.com/BVLC/caffe.git . && \
    pip install --upgrade pip && \
    cd python && pip install -r requirements.txt && cd .. && \
    mkdir build && cd build && \
    cmake -DCPU_ONLY=1 -DBLAS=open .. && \
    make -j"$(nproc)"
Sumsuddin Shojib
  • 3,583
  • 3
  • 26
  • 45
0

While building caffe, you have to add the -fopenmp to the CXXFLAGS and LINKFLAGS to support OPENMP. If you have a flag named OPENMP in the Makefil.config, you can simply set that to 1. You can use either OPENBLAS or Intel MKL BLAS library. While building the OPENBLAS you need to set USE_OPENMP=1 flag so that it supports OPENMP. After building caffe, please export the number of threads you want to use during runtime by setting up OMP_NUM_THREADS=n where n is the number of threads you want. Here is a good discussion related to multi-threading in Caffe: https://github.com/BVLC/caffe/pull/439

dipendra009
  • 299
  • 2
  • 7
  • is this in master branch? can you link to the pull request where it is implemented? – Shai May 25 '17 at 04:43
  • what do you mean by where it is implemented? The CXXFLAGS and LINKFLAGS are in Makefile.config of every caffe code you clone/download from Github. – dipendra009 Jul 14 '17 at 15:50
  • I couldn't find where in the caffe code `USE_OPENMP` macro is used, I know it is *defined* via makefile, but I could not find where in the code this macro is actually queried. – Shai Jul 16 '17 at 05:37
  • 1
    The USE_OPENMP macro is not in caffe makefile, it's for enabling multithreading in OPENBLAS. If you are using OPENBLAS, enable the flag USE_OPENMP in it's makefile to support multi-threading. – dipendra009 Jul 17 '17 at 17:23
  • What is difference between `OMP_NUM_THREADS` vs `OPENBLAS_NUM_THREADS` ? – mrgloom Jul 21 '17 at 13:07
  • 1
    OMP_NUM_THREADS is the environment variable to set the number of OpenMP threads in general. OPENBLAS_NUM_THREADS sets the number of threads for OpenBLAS library. – dipendra009 Jul 21 '17 at 15:44