Error calculation in backpropagation (gradient descent)

Question

Can someone please give an explanation about the calculation of the error in backpropagation which is found in many code examples such as:

error=calculated-target
// then calculate error with respect to each parameter...

Is this same for squared error and cross entropy error? How?

Thanks...

score 1 · Answer 1 · answered Jul 29 '17 at 09:32

I will note x an example from the training set, f(x) the prediction of your network for this particular example, and g_x the ground truth (label) associated to x.

The short answer is, the root means squared error (RMS) is used when you have a network that can exactly, and differentiably, predict the labels that you want. The cross-entropy error is used when your network predicts scores for a set of discrete labels.

To clarify, you usually use Root Mean Squared (RMS) when you want to predict values that can change continuously. Imagine you want your network to predict vectors in R^n. This is the case when, for example, you want to predict surface normals or optical flow. Then, these values can changes continuously, and ||f(x)-g_x|| is differentiable. You can use backprop and train your network.

Crosss-entropy, on the other hand, is useful in classification with n labels, for example, in image classification. In that case, the g_x take the discrete values c_1,c_2,...,c_m where m is the number of classes. Now, you can not use RMS because if you assume that your netwrok predicts the exact labels (i.e. f(x) in {c_1,...,c_m}), then ||f(x)-g_x|| is no longer differentiable, and you can not use back-propagation. So, you make a network that does not compute class labels directly, but instead computes a set of scores s_1,...,s_m for each class label. Then, you maximize the probability of the correct score, by maximizing a softmax function on the scores. This makes the loss function differentiable.

So you say, derivation of RMS and Softmax function of error is same, right? — Turkdogan Tasdelen, Aug 03 '17 at 10:21
I probably misunderstood your comment, but if by derivation you mean computing the derivatives then no, they are not the same (just write the chain rule for each one). — Ash, Aug 03 '17 at 11:05
(Although not quite sure) as far as i remember, most of the the code implementations of the backpropagation include a statement (error=calculated-target) for both RMS ve cross entropy error functions. An example (https://stackoverflow.com/questions/40575841/numpy-calculate-the-derivative-of-the-softmax-function) I wondered how this inferred. Or am I wrong completely wrong? — Turkdogan Tasdelen, Aug 03 '17 at 11:27
I am unsure about what you are asking... I suggest you post a new question where you show *actual code* from an implementation. This way you will obtain a precise response. — Ash, Aug 03 '17 at 11:37
Thanks, I probably asked a wrong question. I will create a question with more concrete example. — Turkdogan Tasdelen, Aug 03 '17 at 11:47

Error calculation in backpropagation (gradient descent)

1 Answers1