23

I am making a request to the completions endpoint. My prompt is 1360 tokens, as verified by the Playground and the Tokenizer. I won't show the prompt as it's a little too long for this question.

Here is my request to openai in Nodejs using the openai npm package.

const response = await openai.createCompletion({
  model: 'text-davinci-003',
  prompt,
  max_tokens: 4000,
  temperature: 0.2
})

When testing in the playground my total tokens after response are 1374.

When submitting my prompt via the completions API I am getting the following error:

error: {
  message: "This model's maximum context length is 4097 tokens, however you requested 5360 tokens (1360 in your prompt; 4000 for the completion). Please reduce your prompt; or completion length.",
  type: 'invalid_request_error',
  param: null,
  code: null
}

If you have been able to solve this one, I'd love to hear how you did it.

Rok Benko
  • 14,265
  • 2
  • 24
  • 49
Kane Hooper
  • 1,531
  • 1
  • 9
  • 21

3 Answers3

26

The max_tokens parameter is shared between the prompt and the completion. Tokens from the prompt and the completion all together should not exceed the token limit of a particular OpenAI model.

As stated in the official OpenAI article:

Depending on the model used, requests can use up to 4097 tokens shared between prompt and completion. If your prompt is 4000 tokens, your completion can be 97 tokens at most.

The limit is currently a technical limitation, but there are often creative ways to solve problems within the limit, e.g. condensing your prompt, breaking the text into smaller pieces, etc.

Note: For counting tokens before(!) sending an API request, see this answer.

GPT-4 models:

LATEST MODEL DESCRIPTION MAX TOKENS TRAINING DATA
gpt-4 More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat. Will be updated with our latest model iteration 2 weeks after it is released. 8,192 tokens Up to Sep 2021
gpt-4-0613 Snapshot of gpt-4 from June 13th 2023 with function calling data. Unlike gpt-4, this model will not receive updates, and will be deprecated 3 months after a new version is released. 8,192 tokens Up to Sep 2021
gpt-4-32k Same capabilities as the base gpt-4 mode but with 4x the context length. Will be updated with our latest model iteration. 32,768 tokens Up to Sep 2021
gpt-4-32k-0613 Snapshot of gpt-4-32k from June 13th 2023. Unlike gpt-4-32k, this model will not receive updates, and will be deprecated 3 months after a new version is released. 32,768 tokens Up to Sep 2021
gpt-4-0314 (Legacy) Snapshot of gpt-4 from March 14th 2023 with function calling data. Unlike gpt-4, this model will not receive updates, and will be deprecated on June 13th 2024 at the earliest. 8,192 tokens Up to Sep 2021
gpt-4-32k-0314 (Legacy) Snapshot of gpt-4-32 from March 14th 2023. Unlike gpt-4-32k, this model will not receive updates, and will be deprecated on June 13th 2024 at the earliest. 32,768 tokens Up to Sep 2021

GPT-3.5 models:

LATEST MODEL DESCRIPTION MAX TOKENS TRAINING DATA
gpt-3.5-turbo Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003. Will be updated with our latest model iteration 2 weeks after it is released. 4,096 tokens Up to Sep 2021
gpt-3.5-turbo-16k Same capabilities as the standard gpt-3.5-turbo model but with 4 times the context. 16,384 tokens Up to Sep 2021
gpt-3.5-turbo-0613 Snapshot of gpt-3.5-turbo from June 13th 2023 with function calling data. Unlike gpt-3.5-turbo, this model will not receive updates, and will be deprecated 3 months after a new version is released. 4,096 tokens Up to Sep 2021
gpt-3.5-turbo-16k-0613 Snapshot of gpt-3.5-turbo-16k from June 13th 2023. Unlike gpt-3.5-turbo-16k, this model will not receive updates, and will be deprecated 3 months after a new version is released. 16,384 tokens Up to Sep 2021
gpt-3.5-turbo-0301 (Legacy) Snapshot of gpt-3.5-turbo from March 1st 2023. Unlike gpt-3.5-turbo, this model will not receive updates, and will be deprecated on June 13th 2024 at the earliest. 4,096 tokens Up to Sep 2021

GPT-3 models:

LATEST MODEL DESCRIPTION MAX TOKENS TRAINING DATA
text-davinci-003 (Legacy) Can do any language task with better quality, longer output, and consistent instruction-following than the curie, babbage, or ada models. Also supports some additional features such as inserting text. 4,097 tokens Up to Jun 2021
text-davinci-002 (Legacy) Similar capabilities to text-davinci-003 but trained with supervised fine-tuning instead of reinforcement learning. 4,097 tokens Up to Jun 2021
text-curie-001 Very capable, faster and lower cost than Davinci. 2,049 tokens Up to Oct 2019
text-babbage-001 Capable of straightforward tasks, very fast, and lower cost. 2,049 tokens Up to Oct 2019
text-ada-001 Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost. 2,049 tokens Up to Oct 2019
davinci Most capable GPT-3 model. Can do any task the other models can do, often with higher quality. 2,049 tokens Up to Oct 2019
curie Very capable, but faster and lower cost than Davinci. 2,049 tokens Up to Oct 2019
babbage Capable of straightforward tasks, very fast, and lower cost. 2,049 tokens Up to Oct 2019
ada Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost. 2,049 tokens Up to Oct 2019

GPT base models:

LATEST MODEL DESCRIPTION MAX TOKENS TRAINING DATA
davinci-002 Replacement for the GPT-3 curie and davinci base models. 16,384 tokens Up to Sep 2021
babbage-002 Replacement for the GPT-3 ada and babbage base models. 16,384 tokens Up to Sep 2021
Rok Benko
  • 14,265
  • 2
  • 24
  • 49
  • 2
    Thanks Cervus, this cleared up a misunderstanding I had. – Kane Hooper Feb 09 '23 at 10:23
  • I have a summarization task with a long input text. Is there a workaround to handle longer input texts? – M.Hossein Rahimi Mar 22 '23 at 03:02
  • 1
    @M.HosseinRahimi Hm, I've seen a few questions regarding this. The only easy option I see is breaking the text into smaller pieces. At the end, you would have, let's say, five summaries, and then you could do a final summarization. Is this an option? – Rok Benko Mar 22 '23 at 09:10
  • If I choose the gpt-3.5-turbo-16k model and my total token usage for a request is 5k tokens, will 4k tokens be charged at the 4k rate and the remaining 1k be charged at the 16k rate? – thdoan Jul 17 '23 at 07:09
  • 1
    @thdoan I think the answer is no. Pricing depends on the model used, not on the number of tokens spent. I know what you mean, but as far as I know, the answer is no. See [Pricing](https://openai.com/pricing). – Rok Benko Jul 17 '23 at 08:13
1

This was solved by Reddit user 'bortlip'.

The max_tokens parameter defines the response tokens.

From OpenAI:

https://platform.openai.com/docs/api-reference/completions/create#completions/create-max_tokens

The token count of your prompt plus max_tokens cannot exceed the model's context length.

Therefore to solve the issue I subtract the token count of the prompt from the max_tokens and it works just fine.

Kane Hooper
  • 1,531
  • 1
  • 9
  • 21
1

An important note for gpt-3.5-turbo and gpt-4 users, as per documentation:

ChatGPT models like gpt-3.5-turbo and gpt-4 use tokens in the same way as older completions models, but because of their message-based formatting, it's more difficult to count how many tokens will be used by a conversation.

Please, refer to OpenAI CookBook for examples of how to deal with this if you're receiving this error because of wrongly calculated tokens. Also, the official docs with an example.

Ivan Sivak
  • 7,178
  • 3
  • 36
  • 42