6

I see some github comments saying the output of the model() call's loss is in the form of perplexity: https://github.com/huggingface/transformers/issues/473

But when I look at the relevant code... https://huggingface.co/transformers/_modules/transformers/modeling_openai.html#OpenAIGPTLMHeadModel.forward

    if labels is not None:
        # Shift so that tokens < n predict n
        shift_logits = lm_logits[..., :-1, :].contiguous()
        shift_labels = labels[..., 1:].contiguous()
        # Flatten the tokens
        loss_fct = CrossEntropyLoss()
        loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
        outputs = (loss,) + outputs

    return outputs  # (loss), lm_logits, (all hidden states), (all attentions)

I see cross entropy being calculated, but no transformation into perplexity. Where does the loss finally get transformed? Or is there a transformation already there that I'm not understanding?

stan0
  • 11,549
  • 6
  • 42
  • 59
user947659
  • 2,485
  • 4
  • 21
  • 24
  • Could [this](https://jiangnanhugo.github.io/2016/perplexity-vs-cross-entropy/) be the answer to your question? I don't fully understand the article (which is why I'm only posting it as a comment), but it seems that there is an inherent relation between perplexity and CE loss... – dennlinger Mar 24 '20 at 15:06
  • There is a relationship, in that you need to calculate CE to get perplexity. I guess I am just confused as to where they are doing the 2^(CE Loss) in the code... – user947659 Mar 24 '20 at 15:31

2 Answers2

8

Ah ok, I found the answer. The code is actually returning cross entropy. In the github comment where they say it is perplexity...they are saying that because the OP does

return math.exp(loss)

which transforms entropy to perplexity :)

user947659
  • 2,485
  • 4
  • 21
  • 24
  • Did you conclude that based on the definitions of cross entropy and perplexity? I have a rough idea what both are but I don't know that much details to make the same conclusion. If you have any recommendations/sources I'd be grateful. – stan0 Jun 21 '20 at 16:28
  • 1
    Yup, but looking at the equations it is pretty straightforward to step through the code and see what is happening. Thankfully this is open source! – user947659 Jun 30 '20 at 18:00
0

No latex no problem. By definition the perplexity (triple P) is:

PP(p) = e^(H(p))

Where H stands for chaos (Ancient Greek: χάος) or entropy. In general case we have the cross entropy:

PP(p) = e^(H(p,q))

e is the natural base of the logarithm which is how PyTorch prefers to compute the entropy and cross entropy.

prosti
  • 42,291
  • 14
  • 186
  • 151