23

I have a question similar to this one.

Because I have limited resources and I work with a deep model (VGG-16) - used to train a triplet network - I want to accumulate gradients for 128 batches of size one training example, and then propagate the error and update the weights.

It's not clear to me how do I do this. I work with tensorflow but any implementation/pseudocode is welcome.

Pop
  • 12,135
  • 5
  • 55
  • 68
Hello Lili
  • 1,527
  • 1
  • 25
  • 50

2 Answers2

29

Let's walk through the code proposed in one of the answers you linked to:

## Optimizer definition - nothing different from any classical example
opt = tf.train.AdamOptimizer()

## Retrieve all trainable variables you defined in your graph
tvs = tf.trainable_variables()
## Creation of a list of variables with the same shape as the trainable ones
# initialized with 0s
accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]

## Calls the compute_gradients function of the optimizer to obtain... the list of gradients
gvs = opt.compute_gradients(rmse, tvs)

## Adds to each element from the list you initialized earlier with zeros its gradient (works because accum_vars and gvs are in the same order)
accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]

## Define the training step (part with variable value update)
train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)])

This first part basically adds new variables and ops to your graph which will allow you to

  1. Accumulate the gradient with ops accum_ops in (the list of) variable accum_vars
  2. Update the model weights with ops train_step

Then, to use it when training, you have to follow these steps (still from the answer you linked):

## The while loop for training
while ...:
    # Run the zero_ops to initialize it
    sess.run(zero_ops)
    # Accumulate the gradients 'n_minibatches' times in accum_vars using accum_ops
    for i in xrange(n_minibatches):
        sess.run(accum_ops, feed_dict=dict(X: Xs[i], y: ys[i]))
    # Run the train_step ops to update the weights based on your accumulated gradients
    sess.run(train_step)
Kraigolas
  • 5,121
  • 3
  • 12
  • 37
Pop
  • 12,135
  • 5
  • 55
  • 68
  • 3
    so you left `sess.run(train_step)` outside of the loop. So that means that weight update will occur after calculating the gradients of the last batch, is that correct? If we put it inside the loop, it will happen after each epoch right? – ARAT May 20 '19 at 15:38
8

Tensorflow 2.0 Compatible Answer: In line with the Pop's Answer mentioned above and the explanation provided in Tensorflow Website, mentioned below is the code for Accumulating Gradients in Tensorflow Version 2.0:

def train(epochs):
  for epoch in range(epochs):
    for (batch, (images, labels)) in enumerate(dataset):
       with tf.GradientTape() as tape:
        logits = mnist_model(images, training=True)
        tvs = mnist_model.trainable_variables
        accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
        zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
        loss_value = loss_object(labels, logits)

       loss_history.append(loss_value.numpy().mean())
       grads = tape.gradient(loss_value, tvs)
       #print(grads[0].shape)
       #print(accum_vars[0].shape)
       accum_ops = [accum_vars[i].assign_add(grad) for i, grad in enumerate(grads)]



    optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables))
    print ('Epoch {} finished'.format(epoch))

# call the above function    
train(epochs = 3)

Complete code can be found in this Github Gist.