Why do RNNs use the crossentropy as a loss function

Question

I am very new to neural networks and was wondering why all of the examples of RNNs, especially char-rnns use the crossentropy loss function as their loss function. I have googled but can't seem to come across any discussions on the function in this context. I have been asked to motivate for its use and look at its advantages and disadvantages so any papers or sources that I could read through would be much appreciated.

score 4 · Answer 1 · answered Oct 31 '17 at 14:43

Many sequence-to-sequence RNNs, and char-rnn in particular, produce the result by one item at a time, in other words by solving a classification problem at each time step.

Cross-entropy loss is the main choice when doing a classification, no matter if it's a convolutional neural network (example), recurrent neural network (example) or an ordinary feed-forward neural network (example). If you were to write an RNN that solves a regression problem, you'd use a different loss function, such as L2 loss.

All of examples above are using tensorflow and tf.nn.softmax_cross_entropy_with_logits loss.

Why do RNNs use the crossentropy as a loss function

1 Answers1