13

Does Tensorflow utilize Cuda streams automatically for concurrent execution of the computation graph on a single GPU or should streams be assigned manually to ops/tensors ?

Henry Chinner
  • 429
  • 3
  • 9

1 Answers1

12

For now, TensorFlow only uses one compute stream, and multiple copy streams. Some kernels may choose to use multiple streams for computation, while maintaining a single-stream semantics.

Our experiment showed that enabling multi-stream automatically does not bring much performance gains, since most of our kernels are large enough to utilize all processors in GPU. But enabling multi-stream would disable our current design to recycle GPU memory aggressively.

This is a decision we might revisit in the future. If that happens, it is likely for TensorFlow to automatically assign ops/kernels to different Cuda streams, without exposing them to users.

zhengxq
  • 271
  • 2
  • 2
  • 1
    Do you know if this is this still the case, with logical devices? In other words, do all logical devices on the same physical device share the same compute stream? – Neil Jan 22 '21 at 19:28