How to compile Tensor Flow with SSE and AVX instructions on Windows?

Question

With the latest version of Tensor Flow now on windows, I am trying to get everything working as efficiently as possible. However, even when compiling from source, I still can't seem to figure out how to enable the SSE and AVX instructions.

The default process: https://github.com/tensorflow/tensorflow/tree/r0.12/tensorflow/contrib/cmake has no mention of how to do this.

The only reference I have found has been using Google's Bazel: How to compile Tensorflow with SSE4.2 and AVX instructions?

Does anyone know of an easy way to turn on these advanced instructions using MSBuild? I hear they give at least a 3X speed up.

To help those looking for a similar solution, this is the warning I am currently getting looks like this: https://github.com/tensorflow/tensorflow/tree/r0.12/tensorflow/contrib/cmake

I am using Windows 10 Professional on a 64 bit platform, Visual Studio 2015 Community Edition, Anaconda Python 3.6 with cmake version 3.6.3 (later versions don't work for Tensor Flow)

side note, they give "at most" 3x speed-up. You'll see this speed-up if your computation is mostly huge matrix multiplies — Yaroslav Bulatov, Mar 05 '17 at 19:03

score 6 · Answer 1 · answered Mar 13 '17 at 02:17

Well, I tried to fix that, but I am not sure if it really worked.

In CMakeLists.txt you will find the following statements:

if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH)
  include(CheckCXXCompilerFlag)
  CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED)

On MSVC platform, the test failes because MSVC doesn't support -march=native flag. I modified the statements like below:

if (tensorflow_OPTIMIZE_FOR_NATIVE_ARCH)
  include(CheckCXXCompilerFlag)
  CHECK_CXX_COMPILER_FLAG("-march=native" COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
  if (COMPILER_OPT_ARCH_NATIVE_SUPPORTED)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")
  else()
    CHECK_CXX_COMPILER_FLAG("/arch:AVX" COMPILER_OPT_ARCH_AVX_SUPPORTED)
    if(COMPILER_OPT_ARCH_AVX_SUPPORTED)
      set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:AVX")
    endif()
  endif()
endif()

By doing this, cmake would check if /arch:AVX is available and use it. Accordinf to MSDN and MSDN, SSE2 support is enabled by default for x86 compiling but not available for x64 compiling. For x64 compiling you can choose to use AVX or AVX2. I used AVX above because my CPU only supports AVX, youcan try AVX2 if you have a compatible CPU.

By compiling use the above CMakeLists.txt, the compiling preocedure was much slower than official release, and warning about 'AVX/AVX2' disappeared, but warning about SSE/SSE2/3/4.1/4.2 still exists. I think these warnings can be ignored because there's no SSE support for x64 MSBuild.

I am testing the new pip package now. It maybe faster than before, but I don't want to write a new benchmark ...

Any one who is interested in this, please test if the new package is really faster.

I did all these on the lasted git master branch, 2017-3-12. The pip package name shows that it was tensorflow 1.0.1

Compiling with the instructions from [here](https://github.com/tensorflow/tensorflow/tree/v1.1.0/tensorflow/contrib/cmake) with these adaptions works, but I see no speed-up, if building and running the GPU version. Note that I've also tried AVX2 the very same way, but as @TLJ mentioned, it is currently broken. — Alexander Pacha, May 20 '17 at 11:27

score 3 · Answer 2 · answered Mar 05 '17 at 20:27

3

I think you would have to add /arch:avx2 to compiler flags. One way to do it is to modify your CMakeCache.txt in your build folder. Looking for a line CMAKE_CXX_FLAGS:STRING and modify it to

CMAKE_CXX_FLAGS:STRING=/DWIN32 /D_WINDOWS /W3 /GR /EHsc /arch:AVX2 /fp:fast

However, according to this issue on github. /arch:avx2 is broken at the moment (at HEAD).

answered Mar 05 '17 at 20:27

TLJ

4,525
2
31
46

Tried your suggestion of editing the CMakeCache.txt file in the build folder... after uninstalling/reinstalling the new build, no luck :(. Is there not a cmake option I can pass in to do the same thing? – Aerophilic Mar 08 '17 at 03:34
From `CMakeLists.txt`, the flag `tensorflow_OPTIMIZE_FOR_NATIVE_ARCH` (default to On) supposes to set `-march=native` automatically for everyone opt-in compiling the code. This should do the trick when using gcc. I'm not sure if it does the same on VC. – TLJ Mar 08 '17 at 15:09
@Aerophilic I am studying about this too. According to [MSDN](https://msdn.microsoft.com/en-us/library/7t5yh4fd.aspx), SSE2 support is enabled by default. – Wesley Ranger Mar 12 '17 at 13:24
1

@Aerophilic Well, SSE/SSE2 are for x86 only, and for x64 you shall use [AVX/AVX2](https://msdn.microsoft.com/en-us/library/jj620901.aspx). For my CPU, only AVX is supported. I am tring to compile with `/arch:AVX` option, and I'll post it here if there is any luck. – Wesley Ranger Mar 12 '17 at 13:38

LuJyKa · Answer 3 · 2017-10-26T02:57:25.237

Tensorflow make a mistake on flag "tensorflow_WIN_CPU_SIMD_OPTIONS".

It is a Flag, not a Boolean.

Before Fix Image

After Fix Image

How to Fix it

"Tensorflow-github/tensorflow/contrib/cmake/CMakeLists.txt" Line 34,there is:

option(tensorflow_WIN_CPU_SIMD_OPTIONS "Enables CPU SIMD instructions")

Replace it wtih

set(tensorflow_WIN_CPU_SIMD_OPTIONS "/arch:AVX" CACHE STRING "Enables CPU SIMD instructions" )

Then, clear the cmake cache (location), and reconfigure.

You will find tensorflow_WIN_CPU_SIMD_OPTIONS is a Flag with Input area instead of checkbox.

tensorflow_WIN_CPU_SIMD_OPTIONS

"/arch:AVX" or "/arch:AVX2" is available

How to compile Tensor Flow with SSE and AVX instructions on Windows?

3 Answers3

How to Fix it

tensorflow_WIN_CPU_SIMD_OPTIONS

Linked