Understanding tensorflow profiling results

Question

This example shows how to profile tensorflow programs. I have used this tool to profile my program, a simple LSTM. And the results is shown as:

/gpu:0/stream:all Compute(pid 5)

/job:localhost/replica:0/task:0/gpu:0 Compute(pid 3)

My question :

a)what is the meaning of each row.

b)Especially what is the difference between /gpu:0/stream:all Compute(pid 5) and /job:localhost/replica:0/task:0/gpu:0 Compute(pid 3).

c)Why their execution time are different, namely 0.072ms and 0.094ms.

score 2 · Accepted Answer · answered Apr 18 '17 at 01:22

2

Here's an update from one of the engineers:

The '/gpu:0/stream:*' timelsines are hardware tracing of CUDA kernel execution times.

The '/gpu:0' lines are the TF software device enqueueing the ops on the CUDA stream (usually takes almost zero time)

answered Apr 18 '17 at 01:22

Pete Warden

so, the number in `/gpu:0` line includes the gpu kernel launch time? – pgplus1628 Apr 18 '17 at 02:23

1 Answers1