So, similar to this question: How to update model parameters with accumulated gradients?
I have a large network, and a very small batch size. To combat this I want to accumulate gradients (multiple forward passes) and then apply the update of the parameters using the mean gradient.
However, my network has BN layers. How should I handle this?