I am doing classifying video sequence, I need 2 things:
Because of limited GPU memory, I want to accumulate gradient across mini-batch, and then average gradient value, and then back propagation.
I need to know how to shuffle between mini-batch but not shuffle inside each mini-batch, because I want the video sequence keep its order.