TensorFlow GPU compile flags?

Question

I'm looking for some clarification on compile flag options when using TensorFlow with NVIDIA GPUs on Ubuntu 18.04. I'm using TensorFlow for both Python (for training) and calling from C++ (for executing in production).

Since Ubuntu 18.04 ships with GCC 7.x I have to use CUDA 9.2, so I can't use the Google provided TensorFlow pre-compiled binaries (those currently only work with CUDA 9.0 which is not compliant with GCC 7.x). Therefore, I have to compile TensorFlow from source to use with both Python and C++

Currently I'm using the following compile flags:

Python compile:

bazel build --config=opt \
            --config=cuda //tensorflow/tools/pip_package:build_pip_package

C++ compile:

bazel build -c opt \
            --copt=-mavx \
            --copt=-mavx2 \
            --copt=-mfma \
            --copt=-mfpmath=both \
            --copt=-msse4.2 \
            --config=cuda //tensorflow:libtensorflow_cc.so

This is based mostly on popular vote on the interwebs which makes me uncomfortable. Here are some of the sites/posts consulted which lead me to these choices:

https://www.tensorflow.org/install/source has:

bazel build --config=opt \\
            --config=cuda //tensorflow/tools/pip_package:build_pip_package

http://www.bitbionic.com/2017/08/18/run-your-keras-models-in-c-tensorflow/ has:

bazel build --jobs=6 \
            --verbose_failures \
            -c opt \
            --copt=-mavx \
            --copt=-mfpmath=both \
            --copt=-msse4.2 //tensorflow:libtensorflow_cc.so

How to compile Tensorflow with SSE4.2 and AVX instructions? has:

bazel build -c opt \
            --copt=-mavx \
            --copt=-mavx2 \
            --copt=-mfma \
            --copt=-mfpmath=both \
            --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

Re-build Tensorflow with desired optimization flags has:

bazel build -c opt \
            --copt=-mavx \
            --copt=-mavx2 \
            --copt=-mfma \
            --copt=-mfpmath=both \
            --copt=-msse4.2 \
            --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

Can somebody enlighten us all on these flag options? Specifically I have the following questions:

1) What are the mavx, mavx2, mfma, and mfpmath flags? Should these be used for both the Python and C++ compile or only for the C++ compile? The fact that the Google walk-through does not use these for the Python compile inclines me towards the same.

2) Clearly the --copt=-msse4.2 is to enable SSE optimization for Intel CPUs and the --config=cuda is for CUDA GPUs, but what is the -k option at the end of the CUDA flag? Note that some of the above examples use the -k option and some do not.

3) Is there a place where these options are documented? I wonder if there are other flags that could be beneficial or if some of the above should be omitted. I checked the TensorFlow and Bazel GitHubs and did not find anything on this topic.

score 2 · Accepted Answer · answered Nov 08 '18 at 02:07

1) There flags are passed by bazel to gcc. See GCC documentation for exact meaning. For example, for fpmath see something like https://gcc.gnu.org/onlinedocs/gcc-4.4.7/gcc/i386-and-x86_002d64-Options.html. You should use them when you know that the CPU you will be running the binary on supports these options. It should not hurt to use them everywhere.

2) -k is the bazel flag documented here: https://docs.bazel.build/versions/master/user-manual.html#flag--keep_going

3) Yes, see above.

score 2 · Answer 2 · answered Jan 04 '19 at 04:04

From the gcc documentation it seems that the tensorflow default of -march=native is all that should be necessary to utilize all of the capabilities of your processor. That's probably better than setting these optimization flags manually, as you could be missing some.

TensorFlow GPU compile flags?

2 Answers2