2

tf.nn.softmax_cross_entropy_with_logits, Documentation says that it computes softmax cross entropy between logits and labels what does it mean? Is it not applying cross entropy loss function formula on it? Why documentation says that it computes sofmax cross entropy?

Marco A.
  • 43,032
  • 26
  • 132
  • 246
R.K
  • 67
  • 2
  • 8
  • 1
    I think all it means is that it will apply softmax to logits and then compute cross entropy against the labels. Just a shortcut for a common theme in classification models. – Mad Wombat May 22 '17 at 22:38
  • You should use softmax with cross entropy function as it is numerically stable. Read more on that [here](https://stackoverflow.com/a/34243720/1586200). – Autonomous May 23 '17 at 00:23

1 Answers1

0

Also from the Docs:

Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class).

Softmax classification uses cross-entropy loss function to train and classify data among discrete classes. There are other activation functions used like ReLU (Rectified Linear Units) or Sigmoid that are used in Linear Classification and NN; in this case Softmax is used.

Activation functions are decision functions (the ones that actually classify data into categories) and cross-entropy is the function used to calculate the error during training (you could use other ways to calculate the error cost like mean squares). However, cross-entropy seems to be the currently the best way to calculate it.

As some point out, softmax cross-entropy is a commonly used term in Classification for convenient notation.

Edit

Regarding the logits, it means that it works with its input data unscaled. In other words, the input data may not be a probability value (i.e., values may be > 1). Check this question to know more about softmax_cross_entropy_with_logits and its components.

DarkCygnus
  • 7,420
  • 4
  • 36
  • 59
  • Your response doesn't explain why the function uses *logits* and not *softmax* as input. – Anton Codes May 23 '17 at 03:51
  • OP doesnt ask that, he/she asked for an explanation on softmax in that function – DarkCygnus May 23 '17 at 06:46
  • 1
    Expanded my answer based on @wontonimo 's comment – DarkCygnus May 23 '17 at 17:11
  • you note that "There are other activation functions used like ReLU (Rectified Linear Units) or Sigmoid that are used in Linear Classification and NN; in this case Softmax is used." this comment is misleading: softmax is not comparable to those activation functions. – mynameisvinn Nov 12 '17 at 03:19
  • @vin I was stating there were *other* activation functions. And yes, although they are not the same, they are comparable as they all are used to transform the inputs received through the network into certain range, where in the case of `Softmax` a probability distribution (sums 1, all elements between 0 and 1) is returned. Hence its application for classification algorithms or codification. It is not misleading, unless you don't know much about NNs and those functions, but in that case OP wouldn't be asking this. – DarkCygnus Nov 12 '17 at 06:43