What is the correct way of setting up Tensorflow on Linux, after all?

Question

I'm having some misinformation problem regarding Tensorflow. Lot's of info on lot's of places, and never complete enough.

I got my system set up with CUDA 8.0, cuDNN and I have Keras + Theano working ok with python 2.7. I'm trying to move to Tensorflow.

As I had compatibility problems with numpy and other stuff when I tried to install it in the same environment, I installed miniconda2, created a virtual env for it conda create -n tensorflow pip and activated it, as instructed here: https://www.tensorflow.org/install/install_linux#InstallingAnaconda

The environment seems operational.

Afterwards, I installed tensorflow from https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.1-cp27-none-linux_x86_64.whl and also Keras, only to noticed I had some modules duplicated on conda list, some marked with a version string, others marked with <pip> only. Specially, I got one Tensorflow-gpu 1.2.1 and Tensorflow 1.1.0. Both of them. The old version just comes by with Keras.

Also, there's a myriad of warnings about Tensorflow not being compiled to use certain CPU instruction sets, and there's this answer How to compile Tensorflow with SSE4.2 and AVX instructions? about compiling it with using basel, but I don't really find any information about where to put the source code and what files to move to where after running that bazel command line.

To make matters worse, whenever I run a simple 20x20 matrix multiplication code with "/gpu:0" as device, the code list that horrendous warnings, correctly detects the presence of a GTX 1070, but never really confirms it was used to to the calculations. And it runs faster on "/cpu:0". How I miss Theano...

Could someone point me out where can I find:

what version to download of Tensorflow that is current (not necessarily latest)?
concise steps to get it done and how to test if those steps went right?

I'm using Linux Mint 18.

StackOverflow is about helping people fix their programming code. Requests for install debugging, tutorials, research, tools, recommendations, libraries, and code are off-topic. ***Please*** read http://stackoverflow.com/help/how-to-ask , http://stackoverflow.com/help/dont-ask , http://stackoverflow.com/help/mcve and take the [tour](http://stackoverflow.com/tour) before posting more Qs here. Good luck. — shellter, Jul 06 '17 at 18:28
AND post 1 problem at a time with code/inputs/expected output/current output/error messages. Good luck. — shellter, Jul 06 '17 at 18:29

Krishna · Accepted Answer · 2017-07-09T05:11:22.320

I have used conda and have installed Tensorflow=1.1.0, but it never seemed to have worked correctly within python. I also came across in github issues that anconda are currently working on the Tensorflow GPU version and so no matter what I tried in Anaconda, it never used my Tesla NVIDIA P100-SXM2-16GB card and it used only the CPU.

I suggest you use the normal environment till they get Tensorflow-gpu to work right in Anaconda.

To check if the tensorflow-gpu works I used the Inception v3 model with TF0.12 / TF1.0.

This is the process that I go through to install tensorflow1.0:

Step 0.

sudo -i
apt-get install aptitude
aptitude install software-properties-common 
apt-get install libcupti-dev pip
apt-get update
apt-get upgrade libc6

Step 1. Install Nvidia Components. I think you already have that installed

Download the NVIDIA cuDNN 5.1 for CUDA 8.0 from https://developer.nvidia.com/rdp/cudnn-download (Registration in NVIDIA's Accelerated Computing Developer Program is required)

Cudnn 5.1 works well with most of the architectures and OS out there

Step 2. Install bazel and tensorflow

apt-get install bazel

you can go to this link https://pypi.python.org/pypi/tensorflow-gpu/1.1.0rc0 and do a

pip install <python-wheel-version>

If you have python2.7 and python 3.* installed, then use pip2 to install for python2.7

Step 3. Install openjdk

apt-get install openjdk-8-jdk

Step 4. git clone the Inception model code

git clone https://github.com/tensorflow/models.git
cd models
git checkout master
cd inception

This is where bazel comes in the picture. See Bazel's Getting Started docs for a more detailed explanation of what a target is. So, if you do a

ls -lstr

you might see 5 bazel related symbolic links

bazel-bin  bazel-genfiles  bazel-inception  bazel-out  bazel-testlogs

these are the target directory to which you build your specific model

Assuming you're in the models/inception directory

bazel build inception/imagenet_train

This activates the symbolic link

NOTE: For this imagenet_train.py to work you need to prepare the imagenet dataset. You either skip this part or go through this:

STEP 5. Prepare the Imagenet dataset Before you run the training script for the first time, you will need to download and convert the ImageNet data to native TFRecord format. To begin, you will need to sign up for an account with ImageNet to gain access to the data. Look for the sign-up page, create an account and request an access key to download the data.

After you have USERNAME and PASSWORD, you are ready to run our script. Make sure that your hard disk has at least 500 GB of free space for downloading and storing the data. Here we select DATA_DIR=$HOME/imagenet-data as such a location but feel free to edit accordingly.

When you run the below script, please enter USERNAME and PASSWORD when prompted. This will occur at the very beginning. Once these values are entered, you will not need to interact with the script again.

#location of where to place the ImageNet data 
DATA_DIR=$HOME/imagenet-data

Here $HOME is /root

# build the preprocessing script.
bazel build inception/download_and_preprocess_imagenet

# run it
bazel-bin/inception/download_and_preprocess_imagenet "${DATA_DIR}"
# Place the tensor records at /root/dataset

Step 6. Source bazel and tensorflow This step is very important. This will activate the python packages and I think you maybe getting errors because the python package for tensorflow is not activated. If you have skipped step 5 then you might want to go to

/models/inception/sample

and run the gpu.py script

python gpu.py

This should verify that your tensorflow version works with your gpu

source /opt/DL/bazel/bin/bazel-activate
source /opt/DL/tensorflow/bin/tensorflow-activate

You also check by importing tensorflow into python eg: import tensorflow as tf

find a hello world eg on their site and if this gives errors then it has not been installed properly

Step 7. Run the imagenet training --You can skip this step if you have skipped step 5.

bazel-bin/inception/imagenet_train --num_gpus=1 --batch_size=256 --train_dir=/tmp --data_dir=/root/dataset/ --max_steps=100

What is the correct way of setting up Tensorflow on Linux, after all?

1 Answers1