8

My understanding is that TensorFlow creates two thread pools on each device: one for intra op parallelism and one for inter op parallelism.

Suppose there are 3 independent ops A, B, C placed on /gpu:0 and intra_op_parallelism_threads=5. Suppose A and B have a single-threaded GPU kernel implementation, and C has a multi-threaded kernel implementation, does that mean that they can all run in parallel on the same device, A and B using just 1 GPU thread while C uses up to 3 GPU threads?

Now suppose inter_op_parallelism_threads=2, does that mean that only 2 out of 3 ops can be evaluated simultaneously on /gpu:0, so in the example above, it may be A+B, B+C or A+C depending on who gets there first?

Note: I'm trying to make sense of @mrry's answer to this question: Tensorflow: executing an ops with a specific core of a CPU

Community
  • 1
  • 1
MiniQuark
  • 46,633
  • 36
  • 147
  • 183
  • 3
    intra_op_parallelism_threads affects the number of threads in Eigen threadpool, so it has no effect on GPU operations. GPU ops tend to grab the whole GPU – Yaroslav Bulatov Aug 27 '16 at 18:27

0 Answers0