tensorflow code optimization strategy

Question

Please excuse the broadness of this question. Maybe once I know more perhaps I can ask more specifically.

I have performance sensitive piece of tensorflow code. From the perspective of someone who knows little about gpu programming, I would like to know what guides or strategies would be a "good place to start" to optimizing my code. (single gpu)

Perhaps even a readout of how long was spent on each tensorflow op would be nice...

I have a vague understanding that

Some operations go faster when assigned to a cpu rather than a gpu, but it's not clear which
There is a piece of google software called "EEG" that I read about in a
paper that may one day be open sourced.

There may also be other common factors at play that I am not aware of..

Similar issue [here](http://stackoverflow.com/questions/36439483/how-to-get-the-time-consumed-to-execute-each-node-in-tensorflow-graph). Basically you pass specific options to `sess.run()` and use a TimeLine object — Olivier Moindrot, Jun 10 '16 at 15:23

score 18 · Accepted Answer · answered Jun 12 '16 at 12:49

I wanted to give a more complete answer about how to use the Timeline object to get the time of execution for each node in the graph:

you use a classic sess.run() but specifying arguments options and run_metadata
you then create a Timeline object with the run_metadata.step_stats data

Here is in example code:

import tensorflow as tf
from tensorflow.python.client import timeline

x = tf.random_normal([1000, 1000])
y = tf.random_normal([1000, 1000])
res = tf.matmul(x, y)

# Run the graph with full trace option
with tf.Session() as sess:
    run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
    run_metadata = tf.RunMetadata()
    sess.run(res, options=run_options, run_metadata=run_metadata)

    # Create the Timeline object, and write it to a json
    tl = timeline.Timeline(run_metadata.step_stats)
    ctf = tl.generate_chrome_trace_format()
    with open('timeline.json', 'w') as f:
        f.write(ctf)

You can then open Google Chrome, go to the page chrome://tracing and load the timeline.json file. You should something like:

Yup, that's indeed really handy. Just make sure you're not setting `FULL_TRACE` every time you call `sess.run` or you'll slow down your training. I usually call it every 100-1k steps. — Nova, Oct 18 '16 at 20:44

tensorflow code optimization strategy

1 Answers1

Linked