How to compile Tensorflow with SSE4.2 and AVX instructions?

Question

This is the message received from running a script to check if Tensorflow is working:

I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

I noticed that it has mentioned SSE4.2 and AVX,

What are SSE4.2 and AVX?
How do these SSE4.2 and AVX improve CPU computations for Tensorflow tasks.
How to make Tensorflow compile using the two libraries?

I like to build with these flags `bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package` On Xeon E5 v3 that gives me 3x improvement in 8k matmul CPU speed compared to the official release (0.35 -> 1.05 T ops/sec) — Yaroslav Bulatov, Dec 23 '16 at 00:15
and don't forget `NOTE on gcc 5 or later: the binary pip packages available on the TensorFlow website are built with gcc 4, which uses the older ABI. To make your build compatible with the older ABI, you need to add --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" to your bazel build command. ABI compatibility allows custom ops built against the TensorFlow pip package to continue to work against your built package.` from here https://www.tensorflow.org/install/install_sources — Ivan Kush, Jun 17 '17 at 17:33
I have some compiled binaries for TF supporting these instructions https://github.com/lakshayg/tensorflow-build. You might find this helpful. — lakshayg, Jul 10 '17 at 03:24
@IvanKush having added that flag, I'm still unable to successfully import tensorflow (compiles fine). If you successfully compiled with gcc 5, please see: https://stackoverflow.com/questions/45877158/build-tensorflow-from-source-with-gcc-5?noredirect=1#comment78712788_45877158 — anon01, Aug 25 '17 at 08:29
If using Ubuntu 16.04, we have builds for almost all variants you will possibly need at https://github.com/mind/wheels — danqing, Nov 13 '17 at 22:55
You can refer to this tutorial https://medium.com/@exMachina9/how-to-install-tensorflow-with-binaries-and-tensorflow-models-on-mac-os-3e242408f91b — shyam padia, Apr 10 '18 at 14:46
I would like to point out to everyone compiling with Microsoft Visual C++ (msvc), most of the answers here assume you are using `gcc` or `clang`. The compiler options in these answers won't do anything in msvc. As someone who has compiled only a handful of things, this wasn't obvious to me. — JoseOrtiz3, Oct 05 '18 at 05:39

score 172 · Answer 1 · edited Nov 10 '17 at 19:28

172

I just ran into this same problem, it seems like Yaroslav Bulatov's suggestion doesn't cover SSE4.2 support, adding --copt=-msse4.2 would suffice. In the end, I successfully built with

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

without getting any warning or errors.

Probably the best choice for any system is:

bazel build -c opt --copt=-march=native --copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

(Update: the build scripts may be eating -march=native, possibly because it contains an =.)

-mfpmath=both only works with gcc, not clang. -mfpmath=sse is probably just as good, if not better, and is the default for x86-64. 32-bit builds default to -mfpmath=387, so changing that will help for 32-bit. (But if you want high-performance for number crunching, you should build 64-bit binaries.)

I'm not sure what TensorFlow's default for -O2 or -O3 is. gcc -O3 enables full optimization including auto-vectorization, but that sometimes can make code slower.

What this does: --copt for bazel build passes an option directly to gcc for compiling C and C++ files (but not linking, so you need a different option for cross-file link-time-optimization)

x86-64 gcc defaults to using only SSE2 or older SIMD instructions, so you can run the binaries on any x86-64 system. (See https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html). That's not what you want. You want to make a binary that takes advantage of all the instructions your CPU can run, because you're only running this binary on the system where you built it.

-march=native enables all the options your CPU supports, so it makes -mavx512f -mavx2 -mavx -mfma -msse4.2 redundant. (Also, -mavx2 already enables -mavx and -msse4.2, so Yaroslav's command should have been fine). Also if you're using a CPU that doesn't support one of these options (like FMA), using -mfma would make a binary that faults with illegal instructions.

TensorFlow's ./configure defaults to enabling -march=native, so using that should avoid needing to specify compiler options manually.

-march=native enables -mtune=native, so it optimizes for your CPU for things like which sequence of AVX instructions is best for unaligned loads.

This all applies to gcc, clang, or ICC. (For ICC, you can use -xHOST instead of -march=native.)

edited Nov 10 '17 at 19:28

Peter Cordes

328,167
45
605
847

answered Jan 11 '17 at 07:13

Mike Chiu

1,729
1
7
3

7

It works indeed > +1! So it seems `-march=native` does not its job. Beside, dropping the `--config=cuda` (if CUDA support is not needed) and the `-k` (since no error occurred during the compilation) works as well. – Marc Jan 20 '17 at 06:08
4

After uninstalling and reinstalling the new compiled version I still get warnings for AVX, AVX2 and FMA. – Benedikt S. Vogler Mar 02 '17 at 15:00
6

I had to drop `--copt=-mfpmath=both` to make it work with `clang` on macOS. Does it affect the resulting binary? – gc5 Mar 22 '17 at 14:49
Which operating system did you use? – Naveen Dennis Mar 22 '17 at 17:36
I see instructions on how to do a system-wide install on the main page for TF. I am wondering: If you still want to use self-compiled versions of TF within isolated conda environments, how would that work? – Thornhale Mar 29 '17 at 03:52
This answer is very old. Is it still current? – Thornhale Mar 29 '17 at 07:58
2

Just for clarification: when I create the configure file....do I use simply --copt=-march=native? Or do I put in all those optimizations seen in the original posts in where I have the option to put in the optimizations? – Thornhale Mar 30 '17 at 01:22
Did the same as @Marc and everything worked, no warnings anymore in the execution. – Hamza Abbad Apr 18 '17 at 15:28
1

I get an error saying that the 'build' command is only supported from workspace? What to do? – humble Jun 05 '17 at 05:03
I would suggest first checking which flags are supported by your CPU: `gcc -march=native -Q --help=target` and then compiling with matching flags. In my case even though cpu supports both sse4.1/2 and avx it does not support avx2. – Piotr Bazan Jul 20 '17 at 11:58
1

@Thornhale use conda environments. Calling pip from an active environment should install into that specific environment, something like `~/anacondaX/envs/env_name_X/lib/pythonX.X/site-packages` – anon01 Aug 25 '17 at 06:18
You're missing `-mtune=native`. You could and should use `-march=native` instead of `--copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse4.2`, unless TensorFlow's build system overrides `-march`. – Peter Cordes Sep 02 '17 at 23:06
I'm sceptical that `-mfpmath=both` will be useful on modern CPUs, unless it gets gcc to use multiple accumulators in non-vectorizable loops where it would otherwise have had a single latency chain. On current CPUs, x87 competes for the same execution ports as SSE/AVX mul/add/fma. (http://agner.org/optimize/). So if it helps, it's only going to be in scalar code, and then only in cases where having more registers helps, or when the compiler just did a bad job. – Peter Cordes Sep 02 '17 at 23:08
I left an edit on this answer instead of just posting my own because I think it's useful to update this already highly-voted answer now, since it's become a sort of canonical answer that people will copy/paste from. `-march=native` will enable AVX512 on systems where it's available, as well as enabling good stuff like BMI2. I didn't change the recommendation for `-mfpmath=both`, because I haven't benchmarked. It makes some code that looks worse than with the default (for `-m64`) of `-mfpmath=sse`, https://godbolt.org/g/p2KLEC but maybe it does really help in other cases. – Peter Cordes Sep 03 '17 at 00:13
@Mike Chiu, thanks for your answer! But how can I build `tensorflow-gpu` for Windows on Ubuntu (or macOS)? – Dmitry Sep 10 '17 at 07:16
My 5 cents. For me it was necessary to call `pip3 uninstall tensorflow` before `pip3 install /tmp/tensorflow_pkg/tensorflow-1.3.0-cp36-cp36m-macosx_10_6_intel.whl`. – Vladimir Vlasov Oct 01 '17 at 16:50
1

Once compilation is completed is there a way to check what flags TensorFlow was compiled with to verify success? – ehiller Nov 08 '17 at 21:07
Is it possible to make a docker file that does that? – Diego Orellana Jan 29 '18 at 15:56
Thanks for the note about `-mfpmath=both` not working with clang. However, your suggestion to use `-mfpmath=sse` also does not compile. `--copt=-mfpmath=sse` when compiling with clang-6, results in: `error: unknown FP unit 'sse'` – Brent Faust Feb 07 '19 at 19:14
I tried the second command and now I get this error: This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. – Shilan Jun 11 '20 at 20:28

score 140 · Answer 2 · answered Apr 22 '17 at 06:04

Let's start with the explanation of why do you see these warnings in the first place.

Most probably you have not installed TF from source and instead of it used something like pip install tensorflow. That means that you installed pre-built (by someone else) binaries which were not optimized for your architecture. And these warnings tell you exactly this: something is available on your architecture, but it will not be used because the binary was not compiled with it. Here is the part from documentation.

TensorFlow checks on startup whether it has been compiled with the optimizations available on the CPU. If the optimizations are not included, TensorFlow will emit warnings, e.g. AVX, AVX2, and FMA instructions not included.

Good thing is that most probably you just want to learn/experiment with TF so everything will work properly and you should not worry about it

What are SSE4.2 and AVX?

Wikipedia has a good explanation about SSE4.2 and AVX. This knowledge is not required to be good at machine-learning. You may think about them as a set of some additional instructions for a computer to use multiple data points against a single instruction to perform operations which may be naturally parallelized (for example adding two arrays).

Both SSE and AVX are implementation of an abstract idea of SIMD (Single instruction, multiple data), which is

a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Thus, such machines exploit data level parallelism, but not concurrency: there are simultaneous (parallel) computations, but only a single process (instruction) at a given moment

This is enough to answer your next question.

How do these SSE4.2 and AVX improve CPU computations for TF tasks

They allow a more efficient computation of various vector (matrix/tensor) operations. You can read more in these slides

How to make Tensorflow compile using the two libraries?

You need to have a binary which was compiled to take advantage of these instructions. The easiest way is to compile it yourself. As Mike and Yaroslav suggested, you can use the following bazel command

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

What does this command line means? And should I install `bazel` in this case? — Y. Z., Aug 13 '17 at 19:33
Does anyone ever build under windowes 64 bit with MSYS2 or Visual Studio 2017 ommunity Edition? and can share the steps? — James Chang, Mar 08 '18 at 05:21
Can this pip package be installed into a conda environment on the local machine? — dgketchum, Sep 04 '18 at 20:38
After 3+ hours (Elapsed time: 11984.258s) I got `FAILED: Build did NOT complete successfully`. It is not that simple to compile it yourself. — imbrizi, Nov 26 '18 at 20:03
same here. My build failed too and then in the logs I can see that: cl : Command line warning D9002 : ignoring unknown option '-mavx' cl : Command line warning D9002 : ignoring unknown option '-mavx2' cl : Command line warning D9002 : ignoring unknown option '-mfma' cl : Command line warning D9002 : ignoring unknown option '-mfpmath=both' cl : Command line warning D9002 : ignoring unknown option '-msse4.2' cl : Command line warning D9002 : ignoring unknown option '-fno-strict-aliasing' cl : Command line warning D9002 : ignoring unknown option '-fexceptions' so these options aren't known — Shilan, Jun 11 '20 at 21:39
@Salvador Dali, I managed to build with AVX and SSE together but unfortinately on MAC with m1 CPU it wont run, if I build sepertly only with SSE it runs, do you know if its possible to build 1 tensorflow that will support and run on both CPU's? — Stav Bodik, Feb 13 '23 at 17:30

Thornhale · Answer 3 · 2017-06-09T15:52:35.110

56

Let me answer your 3rd question first:

If you want to run a self-compiled version within a conda-env, you can. These are the general instructions I run to get tensorflow to install on my system with additional instructions. Note: This build was for an AMD A10-7850 build (check your CPU for what instructions are supported...it may differ) running Ubuntu 16.04 LTS. I use Python 3.5 within my conda-env. Credit goes to the tensorflow source install page and the answers provided above.

git clone https://github.com/tensorflow/tensorflow 
# Install Bazel
# https://bazel.build/versions/master/docs/install.html
sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel
# Create your virtual env with conda.
source activate YOUR_ENV
pip install six numpy wheel, packaging, appdir
# Follow the configure instructions at:
# https://www.tensorflow.org/install/install_sources
# Build your build like below. Note: Check what instructions your CPU 
# support. Also. If resources are limited consider adding the following 
# tag --local_resources 2048,.5,1.0 . This will limit how much ram many
# local resources are used but will increase time to compile.
bazel build -c opt --copt=-mavx --copt=-msse4.1 --copt=-msse4.2  -k //tensorflow/tools/pip_package:build_pip_package
# Create the wheel like so:
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
# Inside your conda env:
pip install /tmp/tensorflow_pkg/NAME_OF_WHEEL.whl
# Then install the rest of your stack
pip install keras jupyter etc. etc.

As to your 2nd question:

A self-compiled version with optimizations are well worth the effort in my opinion. On my particular setup, calculations that used to take 560-600 seconds now only take about 300 seconds! Although the exact numbers will vary, I think you can expect about a 35-50% speed increase in general on your particular setup.

Lastly your 1st question:

A lot of the answers have been provided above already. To summarize: AVX, SSE4.1, SSE4.2, MFA are different kinds of extended instruction sets on X86 CPUs. Many contain optimized instructions for processing matrix or vector operations.

I will highlight my own misconception to hopefully save you some time: It's not that SSE4.2 is a newer version of instructions superseding SSE4.1. SSE4 = SSE4.1 (a set of 47 instructions) + SSE4.2 (a set of 7 instructions).

In the context of tensorflow compilation, if you computer supports AVX2 and AVX, and SSE4.1 and SSE4.2, you should put those optimizing flags in for all. Don't do like I did and just go with SSE4.2 thinking that it's newer and should superseed SSE4.1. That's clearly WRONG! I had to recompile because of that which cost me a good 40 minutes.

edited Jun 09 '17 at 15:52

answered Mar 30 '17 at 03:27

Thornhale

2,118
1
23
40

where does the .whl file is stored coz I want to install it on windows too? – WiLL_K Apr 06 '17 at 12:18
It's stored here: /tmp/tensorflow_pkg (on your linux drive) – Thornhale Apr 06 '17 at 13:24
Can you tell me how much time will this take. Its about 2 hrs and my laptop froze. Its running ububtu with 4gb of ram and an i5 processor – WiLL_K Apr 06 '17 at 13:25
Hmm, compiling tensorflow does take a long time. On my laptop with 8 gb it took about 1.5 hours. However, your install times may vary and will heavily be influenced by available ram. These compilations are known to take a lot of RAM. To reduce resource requirements and perhaps prevent freezes, you could run the compilation by adding the following flag after "bazel build": --local_resources 2048,.5,1.0 This often helps with freezes but will probably double the time it takes to compile. For example: On one of my faster systems, compiling without the flag took 2200 seconds, with flag 4500 ! – Thornhale Apr 06 '17 at 13:32
I am okay with laptop freezing but ultimately it should compile. I have a lot riding on this project – WiLL_K Apr 06 '17 at 13:34
@Thronhale Will the .whl file created on linux work on windows machine? Because its not working in my case, if not how to create it for windows? – WiLL_K Apr 07 '17 at 12:17
1

In general, I found doing ML on windows is a big pain in the behind. You end up spending a lot of time trying to get things to work that just work if you work in an linux environment. I believe that tensorflow needs to be compiled for each OS. Furthermore, if you go here: [link](https://www.tensorflow.org/install/install_sources), you will see that tensorflow is not officially supported. There does exist some guide on how to compile tensorflow for Windows here: [link](https://www.tensorflow.org/install/install_sources). Though I have to admit, I have not tried that out. I am just using ubuntu. – Thornhale Apr 07 '17 at 14:16
I know that windows is not the right platform for doing ML, I myself am a linux guy but I am a student and am doing a project wherein we are creating a common environment in Matlab for all the ML libraries available. Its just to test the efficiency and accuracy of these libraries. Though this looks far more easy, just implement models and call them from matlab where you can provide all the necessary parameters and comapre. But this is also the problem, until and unless you have errors or warnings matlab will throw an error. Its really frustrating – WiLL_K Apr 07 '17 at 14:24
Strangely, as amortazi [commented](https://github.com/tensorflow/tensorflow/issues/7069#issuecomment-275216149), make sure you perform the `pip3 install ...` command from outside the tensorflow repo, or it won't work! – reubenjohn May 09 '17 at 13:48
How faster is tensorflow with AVX and AVX2 compared with it without AVX and AVX2? – Dmitry Sep 10 '17 at 03:11
@Dmitry: As I mentioned, you can expect a 30-50% speed increase. The speed increase mostly come from the use of the matrix operations enabled with AVX etc. – Thornhale Oct 04 '17 at 18:21
But when GPU is used there is no speed increase! – Dmitry Oct 04 '17 at 21:36
That is correct because then you are not using the CPU matrix operations but GPU resources. – Thornhale Oct 04 '17 at 22:31

score 25 · Answer 4 · answered Dec 29 '16 at 21:28

25

These are SIMD vector processing instruction sets.

Using vector instructions is faster for many tasks; machine learning is such a task.

Quoting the tensorflow installation docs:

To be compatible with as wide a range of machines as possible, TensorFlow defaults to only using SSE4.1 SIMD instructions on x86 machines. Most modern PCs and Macs support more advanced instructions, so if you're building a binary that you'll only be running on your own machine, you can enable these by using --copt=-march=native in your bazel build command.

answered Dec 29 '16 at 21:28

Josh Bleecher Snyder

8,262
3
35
37

Why doesn't the Tensorflow binary use CPU dispatching? Is that poorly supported by GCC? – Chris Pushbullet Feb 21 '17 at 03:33
4

The link "tensorflow installation docs" does not work. So I am wondering if this answer is still valid. Please respond! – Thornhale Mar 29 '17 at 07:58
@ChrisPushbullet you can compile Tensorflow to support several different compute capabilities for the GPU, but they increase the binary size a lot. My guess is that the same is for the CPU. – Davidmh Jul 20 '18 at 05:05

score 22 · Answer 5 · edited Apr 03 '18 at 12:50

22

Thanks to all this replies + some trial and errors, I managed to install it on a Mac with clang. So just sharing my solution in case it is useful to someone.

Follow the instructions on Documentation - Installing TensorFlow from Sources
When prompted for

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]

then copy-paste this string:

-mavx -mavx2 -mfma -msse4.2

(The default option caused errors, so did some of the other flags. I got no errors with the above flags. BTW I replied n to all the other questions)

After installing, I verify a ~2x to 2.5x speedup when training deep models with respect to another installation based on the default wheels - Installing TensorFlow on macOS

Hope it helps

edited Apr 03 '18 at 12:50

Prags

2,457
2
21
38

answered Nov 06 '17 at 21:28

JARS

1,109
7
10

4

`-march=native` should be even better if your compiler supports it correctly. It also sets `-mtune=native` to make good instruction choices for your CPU. e.g. on Haswell and later, it disables `-mavx256-split-unaligned-store` and `-mavx256-split-unaligned-load`, which are on by default for `-mtune=generic` and hurt performance when data isn't known to be aligned but turns out to be at run-time. – Peter Cordes Nov 06 '17 at 21:50
1

Thanks! In my case `-march=native` caused an error while the other options did not. Maybe it's the specific compiler. I was sharing this precisely just in case others experienced the same roadblock. – JARS Nov 06 '17 at 21:54
1

What error? Unless the build system chokes on a string with an `=` in it, or you're not using `gcc` or `clang`, it should work. And does `-mtune=native -mavx2 -mfma` work for you? Or `-mtune=skylake`? (Or whatever CPU you have). BTW, `-mavx2` implies `-mavx` and `-msse4.2`. It doesn't *hurt* to include them all in a recipe, and I guess makes it easier for people to leave out the ones their CPU doesn't support. – Peter Cordes Nov 06 '17 at 21:59
1

I've edited the top answer on this question a while ago, but I don't use tensorflow myself. If there's something wrong with `-march=native` for its build system, I'd like to know. (And/or you should report it upstream so they can fix their build scripts). – Peter Cordes Nov 06 '17 at 22:01
1

Thanks a lot for the suggestion. In order to check that, I've re-run the .configure script with only `-march=native` and this is the error : /Users/jose/Documents/code/tmptensorflow/tensorflow/tensorflow/core/BUILD:1442:1: C++ compilation of rule '//tensorflow/core:lib_internal_impl' failed (Exit 1). In file included from tensorflow/core/platform/denormal.cc:37: /Library/Developer/CommandLineTools/usr/bin/../lib/clang/7.0.2/include/pmmintrin.h:28:2: error: "SSE3 instruction set not enabled" #error "SSE3 instruction set not enabled" using Apple LLVM version 7.0.2 (clang-700.1.81) – JARS Nov 07 '17 at 08:32
In the log, did `-march=native` actually make it to the compiler command line? The only explanation that makes any sense is that a build script "ate" the option, probably because it contains an `=`. (It makes sense that `#include ` would give that error message without `-march=native`, and also that TensorFlow would require SSE3 as a baseline.) No wonder people have so much trouble with building TensorFlow!! Maybe you can try `'-march=native'` (inside single quotes), or `-march\=native`. – Peter Cordes Nov 07 '17 at 08:37

score 8 · Answer 6 · answered Sep 03 '17 at 09:50

I have recently installed it from source and bellow are all the steps needed to install it from source with the mentioned instructions available.

Other answers already describe why those messages are shown. My answer gives a step-by-step on how to isnstall, which may help people struglling on the actual installation as I did.

Install Bazel

Download it from one of their available releases, for example 0.5.2. Extract it, go into the directory and configure it: bash ./compile.sh. Copy the executable to /usr/local/bin: sudo cp ./output/bazel /usr/local/bin

Install Tensorflow

Clone tensorflow: git clone https://github.com/tensorflow/tensorflow.git Go to the cloned directory to configure it: ./configure

It will prompt you with several questions, bellow I have suggested the response to each of the questions, you can, of course, choose your own responses upon as you prefer:

Using python library path: /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with MKL support? [y/N] y
MKL support will be enabled for TensorFlow
Do you wish to download MKL LIB from the web? [Y/n] Y
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 
Do you wish to use jemalloc as the malloc implementation? [Y/n] n
jemalloc disabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] N
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] N
No XLA JIT support will be enabled for TensorFlow
Do you wish to build TensorFlow with VERBS support? [y/N] N
No VERBS support will be enabled for TensorFlow
Do you wish to build TensorFlow with OpenCL support? [y/N] N
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] N
No CUDA support will be enabled for TensorFlow

The pip package. To build it you have to describe which instructions you want (you know, those Tensorflow informed you are missing).

Build pip script: bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 -k //tensorflow/tools/pip_package:build_pip_package

Build pip package: bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

Install Tensorflow pip package you just built: sudo pip install /tmp/tensorflow_pkg/tensorflow-1.2.1-cp27-cp27mu-linux_x86_64.whl

Now next time you start up Tensorflow it will not complain anymore about missing instructions.

Building with just `-c opt --copt=-march=native` should be at least as good as `--copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse4.1 --copt=-msse4.2`. (Both will silence the warning, but `-march=native` might make even faster code by tuning specifically for the CPU on the system you're building on). Also note that `--copt=-mavx2 --copt=-mfma` implies all the earlier AVX and SSE options, so this long string of options was clearly written by someone that doesn't understand gcc options. — Peter Cordes, Sep 03 '17 at 21:29
@PeterCordes, take a look into this issue (https://github.com/tensorflow/tensorflow/issues/7449), even bazel maintainers were not assertive why march=native did not worked as expected. As you seem do "understand gcc options" then you can probably help them to fix it, as they have marked the issue as needing "community support". — Eduardo, Sep 09 '17 at 14:46
Thanks, I'll take a look... Hmm, some people saying that `--copt=-mavx2` didn't work. **If** `--copt=-mfma` works, `--copt=-march=native` should work, unless parsing of the `=` is a problem. For gcc/clang/icc, you definitely want the build script to eventually pass `-march=native` to the compiler. Making that happen via build scripts becomes the trick. — Peter Cordes, Sep 09 '17 at 19:29

score 7 · Answer 7 · edited Jun 20 '20 at 09:12

7

This is the simplest method. Only one step.

It has significant impact on speed. In my case, time taken for a training step almost halved.

Refer custom builds of tensorflow

edited Jun 20 '20 at 09:12

Community

1
1

answered Sep 26 '17 at 12:10

Sreeragh A R

2,871
3
27
54

3

Windows builds including AVX2 https://github.com/fo40225/tensorflow-windows-wheel – Chris Moschini Oct 03 '18 at 15:26
@SreeraghAR Your method downgraded my tensorflow and keras. – asn Dec 31 '18 at 05:27
Please make sure you install correct file according to your TensorFlow, Python versions and HW. – Sreeragh A R Dec 31 '18 at 06:09
@SreeraghAR `TensFlow` version is 1.10.0 and using `MacOS Sierra`. Help me in finding the file. – asn Dec 31 '18 at 12:37
Hmm.. Can't find one corresponding to your versions. Some one has to build a custom wheel. https://github.com/yaroslavvb/tensorflow-community-wheels Immediate solution could be using Tensorflow 1.9.0 – Sreeragh A R Dec 31 '18 at 12:53
As @ChrisMoschini mentioned, uninstalling the tensorflow package and installing the right version from [the provided repo](https://github.com/fo40225/tensorflow-windows-wheel) solved all my problems – Alex Klaus Aug 18 '20 at 02:55

score 5 · Answer 8 · answered Aug 18 '17 at 06:04

I compiled a small Bash script for Mac (easily can be ported to Linux) to retrieve all CPU features and apply some of them to build TF. Im on TF master and use kinda often (couple times in a month).

https://gist.github.com/venik/9ba962c8b301b0e21f99884cbd35082f

score 5 · Answer 9 · answered Jun 16 '18 at 08:46

5

To compile TensorFlow with SSE4.2 and AVX, you can use directly

bazel build --config=mkl --config="opt" --copt="-march=broadwell" --copt="-O3" //tensorflow/tools/pip_package:build_pip_package

Source: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel-cpu-mkl

answered Jun 16 '18 at 08:46

supercheval

357
3
8

2

Did something change recently? Last I checked `--copt="-march=native"` was eating the `=`. (And BTW, those double quotes don't do anything; they'll be removed by the shell before `bazel` sees your command line.) – Peter Cordes Jun 16 '18 at 12:27

score 4 · Answer 10 · answered Nov 28 '19 at 12:04

2.0 COMPATIBLE SOLUTION:

Execute the below commands in Terminal (Linux/MacOS) or in Command Prompt (Windows) to install Tensorflow 2.0 using Bazel:

git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow

#The repo defaults to the master development branch. You can also checkout a release branch to build:
git checkout r2.0

#Configure the Build => Use the Below line for Windows Machine
python ./configure.py 

#Configure the Build => Use the Below line for Linux/MacOS Machine
./configure
#This script prompts you for the location of TensorFlow dependencies and asks for additional build configuration options. 

#Build Tensorflow package

#CPU support
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package 

#GPU support
bazel build --config=opt --config=cuda --define=no_tensorflow_py_deps=true //tensorflow/tools/pip_package:build_pip_package

Which part of this specifies `-march=native`, or other GCC/clang options? I don't see any mention of AVX, FMA, or SSE4.2 in this. (And is Bazel or Tensorflow's build script still broken in a way that only options like `-mavx` work, not `-march=native`? If that's what the problem really was in the top answer on this question) — Peter Cordes, Nov 28 '19 at 12:43
for CPU support with tf version 2.1.0, the option --config=opt did not work for me , I solved it with --config=v2. Also it is good to mention that the right bazel version to build it is 29.0. — Tolik, Mar 16 '20 at 20:50

Barry Rosenberg · Answer 11 · 2017-06-13T19:45:28.593

When building TensorFlow from source, you'll run the configure script. One of the questions that the configure script asks is as follows:

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]

The configure script will attach the flag(s) you specify to the bazel command that builds the TensorFlow pip package. Broadly speaking, you can respond to this prompt in one of two ways:

If you are building TensorFlow on the same type of CPU type as the one on which you'll run TensorFlow, then you should accept the default (-march=native). This option will optimize the generated code for your machine's CPU type.
If you are building TensorFlow on one CPU type but will run TensorFlow on a different CPU type, then consider supplying a more specific optimization flag as described in the gcc documentation.

After configuring TensorFlow as described in the preceding bulleted list, you should be able to build TensorFlow fully optimized for the target CPU just by adding the --config=opt flag to any bazel command you are running.

score -1 · Answer 12 · edited Aug 14 '17 at 03:37

-1

To hide those warnings, you could do this before your actual code.

import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf

edited Aug 14 '17 at 03:37

Carl Binalla

5,393
5
27
46

answered Aug 12 '17 at 18:44

javac

2,819
1
20
22

6

Silently running slower than it could on your hardware seems like a bad idea. – Peter Cordes Sep 02 '17 at 22:52
I agree with @Peter Cordes in general - but sometimes it's nice (in a disciplined, mindful manner) to hide the warnings and focus on the task. – westsider Oct 02 '17 at 21:56
2

@westsider: yeah, it could be useful in some cases, but this isn't a good answer unless it points out the implications: there is real performance being lost if you just hide the warnings instead of recompiling. (Except maybe if you're using a GPU for the heavy lifting, it might still warn about CPU options?) – Peter Cordes Oct 02 '17 at 22:18

How to compile Tensorflow with SSE4.2 and AVX instructions?

12 Answers12

Linked

Related