A common, and important question for developing of DNNs is which operation takes how long and how are they distributed among devices and threads.
This used to be possible in TensorFlow v1 via tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
passed to session.run()
, see Can I measure the execution time of individual operations with TensorFlow?
However in V2 there are no more sessions. Instead you build and train a model like this:
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(loss='sparse_categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(),
metrics=['accuracy'])
model.fit(train_dataset, epochs=2)
The only option I could find is the profiler
API in tensorflow_core.python.eager.profiler
. With that you get a Trace ProtoBuf object which contains events with durations. However the events I get are named 'Model', 'BatchV2', 'TensorSlice', 'Prefetch', 'MemoryCacheImpl', 'MemoryCache', 'TFRecord', 'Shuffle', 'Map', 'FlatMap', '_Send', 'ParallelMap', 'NotEqual', 'ParallelInterleaveV2', 'LogicalAnd'
and have no clear relation to the layers anymore.
How do I get a proper trace for any model which shows runtime and device&thread for all Ops?