How to accumulate gradient across mini-batch and then back-propagation in Chainer?

Question

I am doing classifying video sequence, I need 2 things:

Because of limited GPU memory, I want to accumulate gradient across mini-batch, and then average gradient value, and then back propagation.
I need to know how to shuffle between mini-batch but not shuffle inside each mini-batch, because I want the video sequence keep its order.

score 0 · Accepted Answer · answered Feb 20 '18 at 12:44

0

Question 1: You can forward and backward each minibatch but not call optimizer.update(), after you have repeated forward & backward for necessary minibatches, you can call optimizer.update() to updated based on accumulated gradients.

If you want to achieve it with trainer module, I think you need to override StandardUpdater to define your own Updater class to do above.

Question 2: Are you using trainer module? If so, you can define your own iterator to achieve this. See also below for reference how to define iterator class.

answered Feb 20 '18 at 12:44

corochann

1,604
1
13
24

about question 2: the difficult part is how to write such keep_inside_batch_order iterator in Parallel iterator? I mean to use multiple process to parallel fetch data and keep shuffle=False inside mini-batch.? – machen Mar 02 '18 at 03:24
Maybe I need to know more concrete situation to answer this question. Could you provide detail information in new question thread? – corochann Mar 02 '18 at 09:09

How to accumulate gradient across mini-batch and then back-propagation in Chainer?

1 Answers1