I am trying to train a recurrent neural network, where the input is an image, and the output is a probability blob. Pretty simple network, with Convolution, Pooling and Relu.
I have a set of convolution/relu blocks that are repeated several times to get a sharper blob. I can train succesfully if I don't use shared weights, but if I do, the training always results in NANs. Are there any special considerations to take note of when using shared weights to prevent Nans? Could it be the learning rate I set for each Conv block? Should the learning rate for shared weights be smaller?