Does anyone know if the Tensorflow compiled executables here include AVX support? I have been running that compiled version of Tensorflow on Google Compute Engine and it is slow. Dog slow. Cold molasses slow. LA traffic slow. This article says compiling with AVX support significantly improves performance on Google Compute Engine, but when I follow the compile process on that site it fails. Just wondering if AVX is already in the executables?
-
2Disassemble it with something like `objdump -d` and look for `%ymm`. If you ever see that string as part of a register name, the code is using AVX. (Intel-syntax doesn't use `%` prefixes to separate register names from symbol names, so just look for ymm0 through ymm15). Of course, that won't detect AVX 128 bit instructions like `vmulps %xmm0, %xmm1, %xmm2`, so you could also look for `vmulps`. The leading `v` and (usually) 3 operands means it's an AVX instruction instead of legacy SSE. – Peter Cordes Jul 21 '16 at 06:51
2 Answers
No, tensorflow default distributions are built without CPU extensions, such as SSE4.1, SSE4.2, AVX, AVX2, FMA, etc, because these builds (e.g. ones from pip install tensorflow
) are intended to be compatible with as many CPUs as possible. Another argument is that even with these extensions CPU is a lot slower than a GPU, and it's expected for medium- and large-scale machine-learning training to be performed on a GPU. See also a related discussion here.
The article is right, AVX and FMA instructions significantly (up to 300%!) speed up linear algebra computation, namely dot-product, matrix multiply, convolution, etc. If you want to utilize it, I'll have to pass through compiling tensorflow from sources, which is discussed in this question.

- 52,561
- 27
- 155
- 209
This is the simplest method. Only one step.
It's easy and has significant impact on speed. Can make training 3 times faster.

- 1
- 1

- 2,871
- 3
- 27
- 54