I have an Ubuntu 18.04 installation on a computer with the following CPU and GPU properties
..$cat /proc/cpuinfo/
...
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts
acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs
bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor
ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer
aes xsave rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp
tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust smep erms invpcid mpx rdseed smap
clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window
hwp_epp md_clear flush_l1d
..$ nvidia-smi
Sat Nov 16 13:41:35 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26 Driver Version: 430.26 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 106... Off | 00000000:01:00.0 On | N/A |
| 8% 56C P0 37W / 150W | 216MiB / 3016MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
...
+-------------------------------+----------------------+----------------------+
| 7 GeForce GTX 106... Off | 00000000:07:00.0 Off | N/A |
| 0% 26C P8 5W / 150W | 2MiB / 3019MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Important to notice is that my CPU does not support the AVX
or AVX2
instruction. Also that I have both CUDA and the Nvidia driver installed (along with nvidia-docker
).
On my bare metal system I have tensorflow:1.14.0
installed. I specifically have this version as a result of the issue I am facing, better explained in this SO QA; after tensorflow 1.15.0 the AVX
instruction was used by default. With this version of tensorflow installed on my bare metal machine I can import the library and train a model successfully.
(base) :~$ conda list
# packages in environment at /home/kevin/anaconda3:
#
# Name Version Build Channel
...
keras 2.2.4 0
keras-applications 1.0.8 py_0
keras-base 2.2.4 py37_0
keras-preprocessing 1.1.0 py_1
...
python 3.7.3 h0371630_0
...
tensorboard 1.14.0 py37hf484d3e_0
tensorflow 1.14.0 mkl_py37h45c423b_0
tensorflow-base 1.14.0 mkl_py37h7ce6ba3_0
tensorflow-estimator 1.14.0 py_0
...
(base) :~$ python -c "import tensorflow"
(base) :~$
Though my issue comes in when trying to use a tensorflow docker image which is generated with the following Dockerfile
FROM tensorflow/tensorflow:1.14.0-gpu-py3
LABEL description="SRCNN-Nvidia-Docker-Keras"
WORKDIR /app
# Install the libraries required for opencv-python
RUN apt-get update
RUN apt-get install -y libsm6 libxext6 libxrender-dev
# Install the required python libraries
ADD library-requirements.txt .
RUN pip install -r library-requirements.txt
# Create a mount point in the container to link file systems
VOLUME /app/SRCNN
# library-requiremnets: keras, numpy, matplotlib, h5py, pillow, opencv-python, scipy
I can successfully build the image and run the container, though whilst in the container I cannot import tensorflow
root@8a221a7eca5f:/app# pip list
Package Version
-------------------- --------
...
Keras 2.3.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.0
...
tensorboard 1.14.0
tensorflow-estimator 1.14.0
tensorflow-gpu 1.14.0
...
root@8a221a7eca5f:/app# python -c "import tensorflow"
Illegal instruction (core dumped)
root@8a221a7eca5f:/app#
As far as I can tell the only reason I should be seeing the Illegal instruction
error is when trying to load tensorflow
> 1.15.0 as a result of not having the AVX
instruction. Though, when using version 1.14 I can import on my bare metal machine, but not in the 1.14 version docker container.
What else could be the cause of this?
Is my only real solution to simply compile tensorflow from source in the docker image?