12

TensorFlow calls each of the inputs to a softmax a logit. They go on to define the softmax's inputs/logits as: "Unscaled log probabilities."

Wikipedia and other sources say that a logit is the log of the odds, and the inverse of the sigmoid/logistic function. I.e., if sigmoid(x) = p(x), then logit( p(x) ) = log( p(x) / (1-p(x)) ) = x.

Is there a mathematical or conventional reason for TensorFlow to call a softmax's inputs "logits"? Shouldn't they just be called "unscaled log probabilities"?

Perhaps TensorFlow just wanted to keep the same variable name for binary logistic regression (where it makes sense to use the term logit) and categorical logistic regression...

This question was covered a little bit here, but no one seemed bothered by the use of the word "logit" to mean "unscaled log probability".

Brian Bartoldson
  • 884
  • 9
  • 20

1 Answers1

14

Logit is nowadays used in ML community for any non-normalised probability distribution (basically anything that gets mapped to a probability distribution by a parameter-less transformation, like sigmoid function for a binary variable or softmax for multinomial one). It is not a strict mathematical term, but gained enough popularity to be included in TF documentation.

lejlot
  • 64,777
  • 8
  • 131
  • 164
  • 2
    Do you know why/how this convention came about? Could it have been something like: logits are non-normalized probabilities, therefore non-normalized probabilities are logits? – Brian Bartoldson May 26 '17 at 22:00
  • 2
    I am afraid so. Simply people started changing what follows last layer (normalisation schemes, losses) and did not bother to change the name of the last activation. Community clearly needed a name for this object, and the fact that it became "logit" is probably suboptimal (problematic from math perspective) but kind of "works". – lejlot May 27 '17 at 13:37