1

I used Caffe for some time but now am using Tensorflow. Caffe has a hyperparameter 'iter_size', which accumulates gradients over iter_size x batch_size instances. iter_size is used when GPU memory is limited and there is not enough GPUs.

I am wondering whether we can do the same operation in Tensorflow. I have seen this question. It accumulates the gradients but it does not reset the accumulated gradients to zero after applying gradients on variables.

Community
  • 1
  • 1
LI Xuhong
  • 2,339
  • 2
  • 17
  • 32
  • 1
    The simplest solution could be -- why not use smaller batch size? If you use Adam optimizer, it'll adapt learning rate to the variance. But if you need to use fixed batch size for various reasons, you could do like in that solution, but also would need to do `sess.run(accumulated_gradient.initializer)` before each accumulation loop to reset the gradient to zero – Yaroslav Bulatov Dec 22 '16 at 18:08
  • Thanks, maybe I will try Adam optimizer later... but before that, I tried to add the accumulated_gradient into `tf.GraphKeys.LOCAL_VARIABLES` and then, I run `sess.run(tf.initialize_local_variables())` to reset the gradient to zero. (I've checked that there's no other local variable.) I think this is what you mean about `sess.run(accumulated_gradient.initializer)`. It accumulates the gradients and resets to zero. Unfortunately after about 50 iterations, GPU run out of memory... I don't understand. – LI Xuhong Dec 23 '16 at 10:28
  • BTW, `sess.run(tf.initialize_local_variables()` will append to computation graph. Can you do `tf.get_default_graph.finalize()` before first .run call to make sure you are not growing the graph? – Yaroslav Bulatov Dec 23 '16 at 16:36
  • You are right, I cannot, `RuntimeError: Graph is finalized and cannot be modified.` So what can I do? – LI Xuhong Dec 24 '16 at 00:10
  • create all ops ahead of time (initialize_local_variables gives an op) – Yaroslav Bulatov Dec 24 '16 at 00:46
  • Of course I did call initialize_local_variables (as well as initialize_all_variables) before the loop. – LI Xuhong Dec 24 '16 at 12:18
  • your error message in `Graph is finalized` should tell you which line is responsible for op creation – Yaroslav Bulatov Dec 24 '16 at 16:12
  • It's `sess.run([tf.initialize_local_variables()])` that is responsible for growing the graph. I tried this, putting only this line `sess.run([tf.initialize_local_variables()])` in the loop, however GPU also run out of memory. The local variables in my program are only those accumulate_gradients. I don't understand. Maybe I am not so familiar with Tensorflow. – LI Xuhong Dec 24 '16 at 22:24
  • the solution is to do something like "init_op = tf.initialize_local_variables()", then finalize graph, then run sess.run(local_init_op) in the loop – Yaroslav Bulatov Dec 24 '16 at 22:33
  • @Seven can you elaborate on what you did, and post it as an answer? A code snip would be greatly appreciated – Toke Faurby Jul 03 '17 at 04:56

0 Answers0