11

This example shows how to profile tensorflow programs. I have used this tool to profile my program, a simple LSTM. And the results is shown as:

/gpu:0/stream:all Compute(pid 5)

MatMul_AllCompute

/job:localhost/replica:0/task:0/gpu:0 Compute(pid 3)

MatMul_GpuCompute

My question :

a)what is the meaning of each row.

b)Especially what is the difference between /gpu:0/stream:all Compute(pid 5) and /job:localhost/replica:0/task:0/gpu:0 Compute(pid 3).

c)Why their execution time are different, namely 0.072ms and 0.094ms.

Community
  • 1
  • 1
pgplus1628
  • 1,294
  • 1
  • 16
  • 22

1 Answers1

2

Here's an update from one of the engineers:

The '/gpu:0/stream:*' timelsines are hardware tracing of CUDA kernel execution times.

The '/gpu:0' lines are the TF software device enqueueing the ops on the CUDA stream (usually takes almost zero time)

Pete Warden
  • 2,866
  • 1
  • 13
  • 12