Large Language Model Perplexity

Question

i am currently using GPT-3 and i am trying to compare its capabilities to related language models for my masters thesis. Unfortunatly GPT-3 is an API based application, so i am not really able to extract metrics such as perplexity.

Over the API i have acces to these three metrics and of course the models outputs:

training_loss: loss on the training batch
training_sequence_accuracy: the percentage of completions in the training batch for which the model's predicted tokens matched the true completion tokens exactly. For example, with a batch_size of 3, if your data contains the completions [[1, 2], [0, 5], [4, 2]] and the model predicted [[1, 1], [0, 5], [4, 2]], this accuracy will be 2/3 = 0.67
training_token_accuracy: the percentage of tokens in the training batch that were correctly predicted by the model. For example, with a batch_size of 3, if your data contains the completions [[1, 2], [0, 5], [4, 2]] and the model predicted [[1, 1], [0, 5], [4, 2]], this accuracy will be 5/6 = 0.83

Is there any possibility to calculate the perplexity of my model using python?

Thank you.

One of the theories which may help you is that the `training loss` can be the perplexity itself (or had any close relationship with it, according to perplexity and loss formulation), did you check that? — meti, Mar 31 '22 at 12:31
This would mean a ridiculoulsy low perplexity. The training token loss when finished is 0,18 meaning the perplexity would be 1,19, which is way to low — Fabian, Apr 25 '22 at 11:35

Large Language Model Perplexity

0 Answers0