How to speedup my tensorflow execution on hadoop?

Question

The following script executes very slow. I just want to count the total number of lines in the twitter-follwer-graph (textfile with ~26 GB).

I need to perform a machine learning task. This is just a test on accessing data from the hdfs by tensorflow.

import tensorflow as tf
import time

filename_queue = tf.train.string_input_producer(["hdfs://default/twitter/twitter_rv.net"], num_epochs=1, shuffle=False)

def read_filename_queue(filename_queue):
    reader = tf.TextLineReader()
    _, line = reader.read(filename_queue)
    return line

line = read_filename_queue(filename_queue)

session_conf = tf.ConfigProto(intra_op_parallelism_threads=1500,inter_op_parallelism_threads=1500)

with tf.Session(config=session_conf) as sess:
    sess.run(tf.initialize_local_variables())
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    start = time.time()
    i = 0
    while True:
        i = i + 1
        if i%100000 == 0:
            print(i)
            print(time.time() - start)

        try:
            sess.run([line])
        except tf.errors.OutOfRangeError:
            print('end of file')
            break
    print('total number of lines = ' + str(i))
    print(time.time() - start)

The process needs about 40 secs for the first 100000 lines. I tried to set intra_op_parallelism_threads and inter_op_parallelism_threads to 0, 4, 8, 40, 400 and 1500. But it didn't effect the execution time significantly ...

Can you help me?

system specs:

16 GB RAM
4 CPU cores

What has a machine learning library to do with counting the total amount of lines in a file? Perhaps Dask/PySpark would be a better tool? — Ignacio Vergara Kausel, Jun 14 '17 at 13:37
I need to perform machine learning task. This is just a test on accessing data from the hdfs by tensorflow. @IgnacioVergaraKausel — Phil, Jun 14 '17 at 13:39
Have you played with changing the values for `_op_parallelism_threads`? Any reason for the value 1500? Maybe that number is too high and ends up giving you a big overhead? — Ignacio Vergara Kausel, Jun 14 '17 at 13:47
I agree with the above comment — way too many threads. Try it at `20`... — l'L'l, Jun 14 '17 at 13:51
Or try 0 and let the system determine the value. See fi there is an improvement, and tweak. Maybe a good idea would be to keep it in multiples of your number of cores/multithreading. — Ignacio Vergara Kausel, Jun 14 '17 at 13:53
I tried to set `intra_op_parallelism_threads` and `inter_op_parallelism_threads` to 0, 4, 8, 40, 400 and 1500. But it didn't effect the execution time significantly .. @IgnacioVergaraKausel — Phil, Jun 14 '17 at 13:54
As the first comment says, why aren't you using Spark? https://databricks.com/blog/2016/01/25/deep-learning-with-apache-spark-and-tensorflow.html All of Hadoop is not required — OneCricketeer, Jun 18 '17 at 11:56
Why do you even need Python? https://stackoverflow.com/questions/12716570/count-lines-in-large-files — OneCricketeer, Jun 18 '17 at 12:02
Not entirely sure what will happen, but does the runtime change if you read in two lines per call to run? e.g. add this line: `line2 = read_filename_queue(filename_queue)`. Then change `sess.run([line])` to `session.run([line, line2])`? And of course increment `i` by 2 each time. — user2570465, Jun 22 '17 at 18:20

Tianjin Gu · Answer 1 · 2017-06-26T13:53:39.513

You can split the big file into smaller ones, it may help. And set intra_op_parallelism_threads and inter_op_parallelism_threads to 0;

For many systems, reading a single raw text file with multi-processes is not easy, tensorflow read one file only with one thread, so adjusting tensorflow threads won't help. Spark can process file with multi-threads for it divide the file in blocks and every thread reading the content in lines of it's block and ignoring the characters before first \n for they belongs to last line of last block. For batch data processing, Spark is a better choice while tensorflow is better for machine learning/deep learning task;

score 2 · Answer 2 · edited Sep 19 '18 at 06:31

https://github.com/linkedin/TonY

With TonY, you can submit a TensorFlow job and specify number of workers and whether they require CPUs or GPUs.

We were able to get almost-linear speedup when running on multiple servers with TonY (Inception v3 model): enter image description here

Below is an example of how to use it from the README:

In the tony directory there’s also a tony.xml which contains all of your TonY job configurations. For example:

$ cat tony/tony.xml
<configuration>
  <property>
    <name>tony.worker.instances</name>
    <value>4</value>
  </property>
  <property>
    <name>tony.worker.memory</name>
    <value>4g</value>
  </property>
  <property>
    <name>tony.worker.gpus</name>
    <value>1</value>
  </property>
  <property>
    <name>tony.ps.memory</name>
    <value>3g</value>
  </property>
</configuration>

For a full list of configurations, please see the wiki.

Model code

$ ls src/models/ | grep mnist_distributed
  mnist_distributed.py

Then you can launch your job:

$ java -cp "`hadoop classpath --glob`:tony/*:tony" \
            com.linkedin.tony.cli.ClusterSubmitter \
            -executes src/models/mnist_distributed.py \
            -task_params '--input_dir /path/to/hdfs/input --output_dir /path/to/hdfs/output --steps 2500 --batch_size 64' \
            -python_venv my-venv.zip \
            -python_binary_path Python/bin/python \
            -src_dir src \
            -shell_env LD_LIBRARY_PATH=/usr/java/latest/jre/lib/amd64/server

The command line arguments are as follows: * executes describes the location to the entry point of your training code. * task_params describe the command line arguments which will be passed to your entry point. * python_venv describes the name of the zip locally which will invoke your python script. * python_binary_path describes the relative path in your python virtual environment which contains the python binary, or an absolute path to use a python binary already installed on all worker nodes. * src_dir specifies the name of the root directory locally which contains all of your python model source code. This directory will be copied to all worker nodes. * shell_env specifies key-value pairs for environment variables which will be set in your python worker/ps processes.

mrk · Answer 3 · 2017-06-25T11:43:24.213

I am also a beginner working with tensorflow but since you were asking for answers drawing from credible and/or official sources, here is what I found and might help:

Build and install from source
Utilize queues for reading data
Preprocessing on the CPU
Use NCHW image data format
Place shared parameters on the GPU
Use fused batch norm

Note: The points listed above are explained in greater detail here in the tensorflow performance guide

Another thing you might want to look into is quantization:

Which can explain how to use quantization to reduce model size, both in storage and at runtime. Quantization can improve performance, especially on mobile hardware.

score 0 · Accepted Answer · answered Jun 25 '17 at 13:32

0

I've bypassed this performance problem by using spark instead.

answered Jun 25 '17 at 13:32

Phil

422
1
5
20

1

Well, basically what I suggested firstmost ;) – Ignacio Vergara Kausel Jun 27 '17 at 12:06

score -1 · Answer 5 · answered Jun 14 '17 at 17:48

-1

Try this and it should improve your timing:

session_conf = tf.ConfigProto   
(intra_op_parallelism_threads=0,inter_op_parallelism_threads=0)

It is not good to take the Config in your own hands when you do not know what is an optimum value.

answered Jun 14 '17 at 17:48

Golta hazzy

14
1

https://stackoverflow.com/questions/44546379/how-to-speedup-my-tensorflow-execution#comment76084256_44546379 I already tried this – Phil Jun 15 '17 at 04:19

How to speedup my tensorflow execution on hadoop?

5 Answers5