0

It appears that there are already a couple questions on 'how to' accumulate gradients in TensorFlow. Here's the original and a duplicate.

The accepted recommendation, taken from this issue, is to do the following:

opt = tf.train.AdamOptimizer()
tvs = tf.trainable_variables()
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]                                        
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
gvs = opt.compute_gradients(rmse, tvs)
accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]
train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)])

In the training loop we have:

while True:
    sess.run(zero_ops)
    for i in xrange(n_minibatches):
        sess.run(accum_ops, feed_dict=dict(X: Xs[i], y: ys[i]))
    sess.run(train_step)

I managed to implement a minimal example of this in a Jupyter notebook but I'm bothered by the ad-hoc nature of the solution. Moreover, as shown in the notebook, when training is run a second time the accumulator poses a problem. It's not clear to me right now how I should address this problem.

Aidan Rocke
  • 179
  • 8

1 Answers1

0

So I found the solution to my problem and posted the solution in a public gist. The key thing is to reset the default graph when compiling a new graph and running training for a second time in the same notebook.

So we have:

tf.reset_default_graph()

model = mnist_network(seed=42)

Aidan Rocke
  • 179
  • 8