This example shows how to profile tensorflow programs. I have used this tool to profile my program, a simple LSTM. And the results is shown as:
/gpu:0/stream:all Compute(pid 5)
/job:localhost/replica:0/task:0/gpu:0 Compute(pid 3)
My question :
a)what is the meaning of each row.
b)Especially what is the difference between /gpu:0/stream:all Compute(pid 5)
and /job:localhost/replica:0/task:0/gpu:0 Compute(pid 3)
.
c)Why their execution time are different, namely 0.072ms
and 0.094ms
.