1

I have an Ubuntu 18.04 installation on a computer with the following CPU and GPU properties

..$cat /proc/cpuinfo/
...
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts
acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs
bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor
ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer
aes xsave rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp
tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust smep erms invpcid mpx rdseed smap
clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window
hwp_epp md_clear flush_l1d

..$ nvidia-smi
Sat Nov 16 13:41:35 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On |                  N/A |
|  8%   56C    P0    37W / 150W |    216MiB /  3016MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
...
+-------------------------------+----------------------+----------------------+
|   7  GeForce GTX 106...  Off  | 00000000:07:00.0 Off |                  N/A |
|  0%   26C    P8     5W / 150W |      2MiB /  3019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Important to notice is that my CPU does not support the AVX or AVX2 instruction. Also that I have both CUDA and the Nvidia driver installed (along with nvidia-docker).


On my bare metal system I have tensorflow:1.14.0 installed. I specifically have this version as a result of the issue I am facing, better explained in this SO QA; after tensorflow 1.15.0 the AVX instruction was used by default. With this version of tensorflow installed on my bare metal machine I can import the library and train a model successfully.

(base) :~$ conda list
# packages in environment at /home/kevin/anaconda3:
#
# Name                    Version                   Build  Channel
...
keras                     2.2.4                         0  
keras-applications        1.0.8                      py_0  
keras-base                2.2.4                    py37_0  
keras-preprocessing       1.1.0                      py_1  
...
python                    3.7.3                h0371630_0  
...
tensorboard               1.14.0           py37hf484d3e_0  
tensorflow                1.14.0          mkl_py37h45c423b_0  
tensorflow-base           1.14.0          mkl_py37h7ce6ba3_0  
tensorflow-estimator      1.14.0                     py_0  
...
(base) :~$ python -c "import tensorflow"
(base) :~$ 

Though my issue comes in when trying to use a tensorflow docker image which is generated with the following Dockerfile

FROM tensorflow/tensorflow:1.14.0-gpu-py3
LABEL description="SRCNN-Nvidia-Docker-Keras"
WORKDIR /app
# Install the libraries required for opencv-python
RUN apt-get update
RUN apt-get install -y libsm6 libxext6 libxrender-dev
# Install the required python libraries
ADD library-requirements.txt .
RUN pip install -r library-requirements.txt
# Create a mount point in the container to link file systems
VOLUME /app/SRCNN
# library-requiremnets: keras, numpy, matplotlib, h5py, pillow, opencv-python, scipy

I can successfully build the image and run the container, though whilst in the container I cannot import tensorflow

root@8a221a7eca5f:/app# pip list
Package              Version 
-------------------- --------
...  
Keras                2.3.1   
Keras-Applications   1.0.8   
Keras-Preprocessing  1.1.0   
...
tensorboard          1.14.0  
tensorflow-estimator 1.14.0  
tensorflow-gpu       1.14.0  
...
root@8a221a7eca5f:/app# python -c "import tensorflow"
Illegal instruction (core dumped)
root@8a221a7eca5f:/app# 

As far as I can tell the only reason I should be seeing the Illegal instruction error is when trying to load tensorflow > 1.15.0 as a result of not having the AVX instruction. Though, when using version 1.14 I can import on my bare metal machine, but not in the 1.14 version docker container.

What else could be the cause of this?

Is my only real solution to simply compile tensorflow from source in the docker image?

KDecker
  • 6,928
  • 8
  • 40
  • 81

1 Answers1

2

If culprit is AVX support (And I think this is the case), instead of compiling yourself you can use community wheels - there are few compiled without AVX.

oktogen
  • 391
  • 2
  • 5
  • Ahh, now that I think about it. The tensorflow 1.14 and 1.15 docker images were compiled in the past few months or weeks. Possibly when they were built they did not actually turn off AVX because they were built recently. – KDecker Nov 17 '19 at 04:21
  • After attempting to install various wheels then breaking down and trying to compile from source (took ~22hours on my little G3930), I went with the simplest solution. I just bought a new CPU for the machine. – KDecker Nov 18 '19 at 13:59
  • that is also solution, but there is also possibility that if you are ok with just CPU, you can also use free trial from google cloud platform, azure and so on and use resources of those platforms for free during trial ( but you can't use gpu or tpu on free trial on google cloud platform) and most trials are very generous ... – oktogen Nov 18 '19 at 18:28