Background info
I would like to run tensorflow-serving on some older machine (target system) which doesn't support modern cpu instructions used in the standard tensorflow build. I used these instructions for installing tf-serving via docker. However I ran into the error Tensorflow Serving Illegal Instruction core dumped
similar to this one on github. The suggested solution was to use a docker build-image to compile the binary on my target system which is described here.
Since this part is relevant for the reproduction of my issue I will copy the relevant commands here:
git clone https://github.com/tensorflow/serving
cd serving
docker build --pull -t $USER/tensorflow-serving-devel -f tensorflow_serving/tools/docker/Dockerfile.devel .
This will compile the binary with the flag -march=native
in my docker container on my slow target machine and works.
Target system info
However on my old machine the compilation takes forever and I would like to use my other more powerful pc to cross-compile the binary. I used the commands provided in this answer to find out the needed compilation flags of my target system to replicate the build flag -march=native
which is the default flag implicitly used in the above process.
gcc -### -E - -march=native 2>&1 | sed -r '/cc1/!d;s/(")|(^.* - )//g'
gave me the following flags:
-march=core2 -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mno-movbe -mno-aes -mno-sha -mno-pclmul -mno-popcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mno-avx -mno-avx2 -mno-sse4.2 -mno-sse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd -mno-f16c -mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mno-xsave -mno-xsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-clwb -mno-mwaitx -mno-clzero -mno-pku --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=core2
Note especially the follwing flags at the end which contain spaces:
--param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048
I can provide these flags in the docker build process via the build argument TF_SERVING_BUILD_OPTIONS
as described in the docs here
This string is then used to run bazel build which can be seen in the Dockerfile.devel
Thus I take all the flags from above and put --copt=
in front and put the resulting string in the variable TF_SERVING_BUILD_OPTIONS
. This is my total command including the copts at the end with spaces:
docker build --pull \
--build-arg TF_SERVING_BUILD_OPTIONS="--copt=-mmmx --copt=-mno-3dnow --copt=-msse --copt=-msse2 --copt=-msse3 --copt=-mssse3 --copt=-mno-sse4a --copt=-mcx16 --copt=-msahf --copt=-mno-movbe --copt=-mno-aes --copt=-mno-sha --copt=-mno-pclmul --copt=-mno-popcnt --copt=-mno-abm --copt=-mno-lwp --copt=-mno-fma --copt=-mno-fma4 --copt=-mno-xop --copt=-mno-bmi --copt=-mno-bmi2 --copt=-mno-tbm --copt=-mno-avx --copt=-mno-avx2 --copt=-mno-sse4.2 --copt=-mno-sse4.1 --copt=-mno-lzcnt --copt=-mno-rtm --copt=-mno-hle --copt=-mno-rdrnd --copt=-mno-f16c --copt=-mno-fsgsbase --copt=-mno-rdseed --copt=-mno-prfchw --copt=-mno-adx --copt=-mfxsr --copt=-mno-xsave --copt=-mno-xsaveopt --copt=-mno-avx512f --copt=-mno-avx512er --copt=-mno-avx512cd --copt=-mno-avx512pf --copt=-mno-prefetchwt1 --copt=-mno-clflushopt --copt=-mno-xsavec --copt=-mno-xsaves --copt=-mno-avx512dq --copt=-mno-avx512bw --copt=-mno-avx512vl --copt=-mno-avx512ifma --copt=-mno-avx512vbmi --copt=-mno-clwb --copt=-mno-mwaitx --copt=-mno-clzero --copt=--param l1-cache-size=32 --copt=--param l1-cache-line-size=64 --copt=--param l2-cache-size=2048 --copt=-mtune=core2" \
-t $USER/tensorflow/serving-devel \
-f tensorflow_serving/tools/docker/Dockerfile.devel .
Problem
However bazel complains as follows, which is probably due to the space inbetween --param
and l1-cache-size=32
which is a option for the C-compiler provided to a bazel build call.
ERROR: Skipping 'l1-cache-line-size=64': couldn't determine target from filename 'l1-cache-line-size=64'
ERROR: couldn't determine target from filename 'l1-cache-line-size=64'
INFO: Elapsed time: 20.233s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
The command '/bin/sh -c bazel build --color=yes --curses=yes ${TF_SERVING_BAZEL_OPTIONS} --verbose_failures --output_filter=DONT_MATCH_ANYTHING ${TF_SERVING_BUILD_OPTIONS} tensorflow_serving/model_servers:tensorflow_model_server && cp bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server /usr/local/bin/' returned a non-zero code: 1
What I tried
- I tried escaping the space char in the last flags:
TF_SERVING_BUILD_OPTIONS="--copt=-mmmx ... --copt=--param\ l1-cache-size=32 --copt=--param\ l1-cache-line-size=64 --copt=--param\ l2-cache-size=2048 --copt=-mtune=core2 "
But bazel still complains with the same error message as above.
- I tried enclosing the commands in double or single quotes:
TF_SERVING_BUILD_OPTIONS="--copt=-mmmx ... --copt=\"--param l1-cache-size=32\" --copt=\"--param l1-cache-line-size=64\" --copt=\"--param l2-cache-size=2048\" --copt=-mtune=core2 "
Also the same error as before appears.
I tried using inner double quotes for
copts
and wrap theTF_SERVING_BUILD_OPTIONS
with outer single-quotes but same error.I tried escaping the double-quotes from copts with
\x22
. A similar error as before apears. This time indicating that the target is malformedERROR: Skipping 'l1-cache-size=32\x22': Bad target pattern...
I tried escaping the space char with
\40
:
TF_SERVING_BUILD_OPTIONS="--copt=-mmmx ... --copt=--param\40l1-cache-size=32 --copt=--param\40l1-cache-line-size=64 --copt=--param\40l2-cache-size=2048 --copt=-mtune=core2 "
This time bazel didnt complain, since the argument of copt was one string without normal spaces. However the arguments are passed incorrectly to gcc, since I get the following error:
ERROR: /root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/external/grpc/BUILD:692:1: C++ compilation of rule '@grpc//:grpc_base_c' failed (Exit 1): gcc failed: error executing command
(cd /root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving && \
exec env - \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
PWD=/proc/self/cwd \
PYTHON_BIN_PATH=/usr/bin/python \
/usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++0x' -MD -MF bazel-out/k8-opt/bin/external/grpc/_objs/grpc_base_c/endpoint_pair_uv.d '-frandom-seed=bazel-out/k8-opt/bin/external/grpc/_objs/grpc_base_c/endpoint_pair_uv.o' '-DGRPC_ARES=0' -iquote external/grpc -iquote bazel-out/k8-opt/genfiles/external/grpc -iquote bazel-out/k8-opt/bin/external/grpc -iquote external/zlib_archive -iquote bazel-out/k8-opt/genfiles/external/zlib_archive -iquote bazel-out/k8-opt/bin/external/zlib_archive -isystem external/grpc/include -isystem bazel-out/k8-opt/genfiles/external/grpc/include -isystem bazel-out/k8-opt/bin/external/grpc/include -isystem external/zlib_archive -isystem bazel-out/k8-opt/genfiles/external/zlib_archive -isystem bazel-out/k8-opt/bin/external/zlib_archive -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mno-movbe -mno-aes -mno-sha -mno-pclmul -mno-popcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mno-avx -mno-avx2 -mno-sse4.2 -mno-sse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd -mno-f16c -mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mno-xsave -mno-xsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-clwb -mno-mwaitx -mno-clzero '--param\40l1-cache-size=32' '--param\40l1-cache-line-size=64' '--param\40l2-cache-size=2048' '-mtune=core2' '-std=c++14' '-D_GLIBCXX_USE_CXX11_ABI=0' -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c external/grpc/src/core/lib/iomgr/endpoint_pair_uv.cc -o bazel-out/k8-opt/bin/external/grpc/_objs/grpc_base_c/endpoint_pair_uv.o)
Execution platform: @bazel_tools//platforms:host_platform
gcc: error: unrecognized command line option '--param\40l1-cache-size=32'
gcc: error: unrecognized command line option '--param\40l1-cache-line-size=64'
gcc: error: unrecognized command line option '--param\40l2-cache-size=2048'
Target //tensorflow_serving/model_servers:tensorflow_model_server failed to build
It seems that this is related to the following issue on github.
- I tried the compilation withou the flags which contain spaces and this finished fine which strengthens the assumption that the error is due to the space which is incorrectly handled from bazel.
How can I fix that problem?