I am very new to neural networks and was wondering why all of the examples of RNNs, especially char-rnns use the crossentropy loss function as their loss function. I have googled but can't seem to come across any discussions on the function in this context. I have been asked to motivate for its use and look at its advantages and disadvantages so any papers or sources that I could read through would be much appreciated.
Asked
Active
Viewed 3,417 times
1 Answers
4
Many sequence-to-sequence RNNs, and char-rnn in particular, produce the result by one item at a time, in other words by solving a classification problem at each time step.
Cross-entropy loss is the main choice when doing a classification, no matter if it's a convolutional neural network (example), recurrent neural network (example) or an ordinary feed-forward neural network (example). If you were to write an RNN that solves a regression problem, you'd use a different loss function, such as L2 loss.
All of examples above are using tensorflow and tf.nn.softmax_cross_entropy_with_logits
loss.

Maxim
- 52,561
- 27
- 155
- 209