I'm looking for some clarification on compile flag options when using TensorFlow with NVIDIA GPUs on Ubuntu 18.04. I'm using TensorFlow for both Python (for training) and calling from C++ (for executing in production).
Since Ubuntu 18.04 ships with GCC 7.x I have to use CUDA 9.2, so I can't use the Google provided TensorFlow pre-compiled binaries (those currently only work with CUDA 9.0 which is not compliant with GCC 7.x). Therefore, I have to compile TensorFlow from source to use with both Python and C++
Currently I'm using the following compile flags:
Python compile:
bazel build --config=opt \
--config=cuda //tensorflow/tools/pip_package:build_pip_package
C++ compile:
bazel build -c opt \
--copt=-mavx \
--copt=-mavx2 \
--copt=-mfma \
--copt=-mfpmath=both \
--copt=-msse4.2 \
--config=cuda //tensorflow:libtensorflow_cc.so
This is based mostly on popular vote on the interwebs which makes me uncomfortable. Here are some of the sites/posts consulted which lead me to these choices:
https://www.tensorflow.org/install/source has:
bazel build --config=opt \\
--config=cuda //tensorflow/tools/pip_package:build_pip_package
http://www.bitbionic.com/2017/08/18/run-your-keras-models-in-c-tensorflow/ has:
bazel build --jobs=6 \
--verbose_failures \
-c opt \
--copt=-mavx \
--copt=-mfpmath=both \
--copt=-msse4.2 //tensorflow:libtensorflow_cc.so
How to compile Tensorflow with SSE4.2 and AVX instructions? has:
bazel build -c opt \
--copt=-mavx \
--copt=-mavx2 \
--copt=-mfma \
--copt=-mfpmath=both \
--config=cuda -k //tensorflow/tools/pip_package:build_pip_package
Re-build Tensorflow with desired optimization flags has:
bazel build -c opt \
--copt=-mavx \
--copt=-mavx2 \
--copt=-mfma \
--copt=-mfpmath=both \
--copt=-msse4.2 \
--config=cuda -k //tensorflow/tools/pip_package:build_pip_package
Can somebody enlighten us all on these flag options? Specifically I have the following questions:
1) What are the mavx
, mavx2
, mfma
, and mfpmath
flags? Should these be used for both the Python and C++ compile or only for the C++ compile? The fact that the Google walk-through does not use these for the Python compile inclines me towards the same.
2) Clearly the --copt=-msse4.2
is to enable SSE optimization for Intel CPUs and the --config=cuda
is for CUDA GPUs, but what is the -k
option at the end of the CUDA flag? Note that some of the above examples use the -k
option and some do not.
3) Is there a place where these options are documented? I wonder if there are other flags that could be beneficial or if some of the above should be omitted. I checked the TensorFlow and Bazel GitHubs and did not find anything on this topic.