Performance of single segment operation vs multiple operations on segments

Question

Since currently there is no easy way to profile TensorFlow operations (Can I measure the execution time of individual operations with TensorFlow?), can anyone help me understand the benefits of using segment operations (e.g. segment_sum) compared to using multiple operations on pre-segmented tensors? Would segment_sum be more efficient than using dynamic_partition or gather followed by multiple reduce_sum? Would segment_sum be equally parallelizable?

score 0 · Answer 1 · answered Jun 07 '16 at 01:19

I've updated the SO question you link to with some information about CPU inference profiling tools we've recently released at: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/benchmark

Unfortunately the overall question is a lot harder to answer, since it depends on:

Whether you're focused on training, or inference.
If you're using a GPU, and if so what kind and how many.
Whether you're running distributed.
What your data looks like, and where the bottlenecks are.

What I usually end up doing is building small sub-graphs that are representative of the sort of ops I'm considering, and then timing how long they take on the sort of data I'll be feeding in. I know that isn't immediately helpful, since the experimentation can be time-consuming, but it is the best way to get an intuitive understanding of the optimal solution for your particular circumstances.

Just wondering why you mention training or inference as a factor? I'm just wondering about the properties of an operation and actually, I am not using TF for implementing CNNs (or other NNs). Regarding other points, can TF automatically parallelize an operation (or even a subgraph) on multiple GPUs? I though it cannot. Let's assume for the sake of this question that we are speaking of running each of the two cases on a single GPU (i.e. I am not going to compare the single op solution to multi-op solution distributed on multiple GPUs). — Andrzej Pronobis, Jun 07 '16 at 01:38

Performance of single segment operation vs multiple operations on segments

1 Answers1