0

I am comparing two training runs of a tf.Estimator.Estimator model fed by a tf.data.Dataset iterator. The training is handled by tf.train_and_evaluate()

When I look at the traces of a single training step I noticed that the GPU training is dominated by the IteratorGetNext call which takes 4.5 seconds. The same call when trained using cpus only takes only 100us. See the following photos of the traces:

cpu training:

cpu training

gpu training:

gpu training

What could be causing this, and how can I improve the speed of the GPUs IteratorGetNext?

zephyrus
  • 1,266
  • 1
  • 12
  • 29
  • I guess you have issues feeding the gpu. https://stackoverflow.com/questions/48715062/tensorflow-performance-bottleneck-on-iteratorgetnext – user1462442 Mar 29 '19 at 02:07
  • This would be my guess as well -- but is there any way of prefetching the data on the GPU memory? – zephyrus Mar 29 '19 at 02:31
  • I did send you a link. The guy answered with this official doc https://www.tensorflow.org/guide/performance/datasets – user1462442 Mar 29 '19 at 02:33
  • I've already implemented all of those suggestions -- but AFAIK these only improve the speed if the bottleneck is cpu compute, not communication time between gpu and cpu. – zephyrus Mar 29 '19 at 02:36
  • I would think the slowdown was due to cpu computer time were it not for the fact the CPU-only trained model was so fast, indicating that the cpu is more than capable of doing the pre-processing I put in the input pipeline. – zephyrus Mar 29 '19 at 02:36
  • 1
    Umm, I do not think anyone can extrapolate anything from your opening post. There isnt any idea on how long your models takes to compute nor how big it is. I am not sure if those graphs are complete or a segment. – user1462442 Mar 29 '19 at 02:45
  • These indicate a single training step. The point is only to illustrate the relative speed of the same step `IteratorGetNext` on cpu vs gpu system. – zephyrus Mar 29 '19 at 02:50

0 Answers0