Training TensorFlow model with summary operations is much slower than without summary operations

Question

I am training an Inception-like model using TensorFlow r1.0 with GPU Nvidia Titan X.

I added some summary operations to visualize the training procedure, using code as follows:

def variable_summaries(var):
"""Attach a lot of summaries to a Tensor (for TensorBoard visualization)."""
    with tf.name_scope('summaries'):
        mean = tf.reduce_mean(var)
        tf.summary.scalar('mean', mean)
        with tf.name_scope('stddev'):
            stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
        tf.summary.scalar('stddev', stddev)
        tf.summary.scalar('max', tf.reduce_max(var))
        tf.summary.scalar('min', tf.reduce_min(var))
        tf.summary.histogram('histogram', var)

When I run these operations, the time cost of training one epoch is about 400 seconds. But when I turn off these operations, the time cost of training one epoch is just 90 seconds.

How to optimize the graph to minimize the summary operations time cost?

maybe compute summaries less often? Also, TF 1.0 refactors things to make things more efficient -- when using hooks, summaries are computed at the same time as other tensors, so all the intermediate quantities are reused — Yaroslav Bulatov, Feb 23 '17 at 04:12
I am using TF 1.0. Could you please make it more clear how to use hooks? I tried to use CPU to compute summaries, but it did not help much. I guess it is because of the data transfer between GPU and CPU. @YaroslavBulatov — Da Tong, Feb 23 '17 at 05:30
before moving to hooks, can you just reduce the number of times you compute summaries? — Yaroslav Bulatov, Feb 23 '17 at 17:05
Oh, yes, of course I can. But actually, I just compute the summaries every epoch, not every batch. If I reduce the summaries to every 10 epochs, I am afraid that I will lose some information of training procedure. — Da Tong, Mar 02 '17 at 01:28

score 2 · Answer 1 · answered Feb 05 '19 at 14:33

Summaries of course slow down the training process, because you do more operations and you need to write them to disc. Also, histogram summaries slow the training even more, because for histograms you need more data to be copied from GPU to CPU than for scalar values. So I would try to use histogram logging less often than the rest, that could make some difference.

The usual solution is to compute summaries only every X batches. Since you compute summaries only one per epoch and not every batch, it might be worth trying even less summaries logging.

Depends on how many batches you have in your dataset, but usually you don't lose much information by gathering a bit less logs.

Is there a way of keeping histograms on GPU and just copying back for logging every some epochs, but still keeping the full logging data? — Gulzar, Feb 13 '21 at 08:06

Training TensorFlow model with summary operations is much slower than without summary operations

1 Answers1