How to profile networks in TensorFlow v2

Question

A common, and important question for developing of DNNs is which operation takes how long and how are they distributed among devices and threads.

This used to be possible in TensorFlow v1 via tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE) passed to session.run(), see Can I measure the execution time of individual operations with TensorFlow?

However in V2 there are no more sessions. Instead you build and train a model like this:

model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(loss='sparse_categorical_crossentropy',
            optimizer=tf.keras.optimizers.Adam(),
            metrics=['accuracy'])

model.fit(train_dataset, epochs=2)

The only option I could find is the profiler API in tensorflow_core.python.eager.profiler. With that you get a Trace ProtoBuf object which contains events with durations. However the events I get are named 'Model', 'BatchV2', 'TensorSlice', 'Prefetch', 'MemoryCacheImpl', 'MemoryCache', 'TFRecord', 'Shuffle', 'Map', 'FlatMap', '_Send', 'ParallelMap', 'NotEqual', 'ParallelInterleaveV2', 'LogicalAnd' and have no clear relation to the layers anymore.

How do I get a proper trace for any model which shows runtime and device&thread for all Ops?

Have you find a solution to it? You can now use [tensorboard](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/TensorBoard#used_in_the_tutorials). Though I am having a problem with it. It is not giving me GPU profiling. — Eduardo Reis, Dec 04 '19 at 23:04
No. And the TB profiling can only measure 1 batch(epoch?) and fails to trace the first epoch. Manually (ab)using it to measure more than one leads to such an excessive memory consumption that no useful execution is possible. This is a design flaw as the design requires the trace data to be kept in memory — Flamefire, Dec 05 '19 at 09:20

Jihye Seo · Answer 1 · 2020-12-09T06:02:25.687

The TensorFlow Profiler requires over 2.2 versions of both Tensorflow and Tensorboard.

1. Install 'tensorboard_plugin_profile'

pip install -U tensorboard_plugin_profile

2. Confirm that TensorFlow can access the GPU

device_name = tf.test.gpu_device_name()
if not device_name:
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

3. Define tensorboard callback

logs = "[save path]/logs/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tboard_callback = tf.keras.callbacks.TensorBoard(log_dir = logs,
                                                 histogram_freq = 1, #option
                                                 profile_batch = 5) #option

in my case, without 'profile_batch' option I got 100 step numbers (which is epochs) ex ) tboard_callback = tf.keras.callbacks.TensorBoard(log_dir = logs)

4. Set callbacks property at fit()

history = model.fit(train_input, train_output,
                        batch_size=BATCH_SIZE, epochs=EPOCHS,
                        callbacks=[tboard_callback])

5. After training end, run tensorboard at Terminal