OpenAI GPT-3 API error: "This model's maximum context length is 4097 tokens"

Question

I am making a request to the completions endpoint. My prompt is 1360 tokens, as verified by the Playground and the Tokenizer. I won't show the prompt as it's a little too long for this question.

Here is my request to openai in Nodejs using the openai npm package.

const response = await openai.createCompletion({
  model: 'text-davinci-003',
  prompt,
  max_tokens: 4000,
  temperature: 0.2
})

When testing in the playground my total tokens after response are 1374.

When submitting my prompt via the completions API I am getting the following error:

error: {
  message: "This model's maximum context length is 4097 tokens, however you requested 5360 tokens (1360 in your prompt; 4000 for the completion). Please reduce your prompt; or completion length.",
  type: 'invalid_request_error',
  param: null,
  code: null
}

If you have been able to solve this one, I'd love to hear how you did it.

Remove the "max_tokens" parameter in api call, https://i.stack.imgur.com/KpANu.png — Amit Dave, May 11 '23 at 07:49

Rok Benko · Accepted Answer · 2023-08-23T17:43:27.940

The max_tokens parameter is shared between the prompt and the completion. Tokens from the prompt and the completion all together should not exceed the token limit of a particular OpenAI model.

As stated in the official OpenAI article:

Depending on the model used, requests can use up to 4097 tokens shared between prompt and completion. If your prompt is 4000 tokens, your completion can be 97 tokens at most.

The limit is currently a technical limitation, but there are often creative ways to solve problems within the limit, e.g. condensing your prompt, breaking the text into smaller pieces, etc.

Note: For counting tokens before(!) sending an API request, see this answer.

GPT-4 models:

LATEST MODEL	DESCRIPTION	MAX TOKENS	TRAINING DATA
gpt-4	More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat. Will be updated with our latest model iteration 2 weeks after it is released.	8,192 tokens	Up to Sep 2021
gpt-4-0613	Snapshot of `gpt-4` from June 13th 2023 with function calling data. Unlike `gpt-4`, this model will not receive updates, and will be deprecated 3 months after a new version is released.	8,192 tokens	Up to Sep 2021
gpt-4-32k	Same capabilities as the base `gpt-4` mode but with 4x the context length. Will be updated with our latest model iteration.	32,768 tokens	Up to Sep 2021
gpt-4-32k-0613	Snapshot of `gpt-4-32k` from June 13th 2023. Unlike `gpt-4-32k`, this model will not receive updates, and will be deprecated 3 months after a new version is released.	32,768 tokens	Up to Sep 2021
gpt-4-0314 (Legacy)	Snapshot of `gpt-4` from March 14th 2023 with function calling data. Unlike `gpt-4`, this model will not receive updates, and will be deprecated on June 13th 2024 at the earliest.	8,192 tokens	Up to Sep 2021
gpt-4-32k-0314 (Legacy)	Snapshot of `gpt-4-32` from March 14th 2023. Unlike `gpt-4-32k`, this model will not receive updates, and will be deprecated on June 13th 2024 at the earliest.	32,768 tokens	Up to Sep 2021

GPT-3.5 models:

LATEST MODEL	DESCRIPTION	MAX TOKENS	TRAINING DATA
gpt-3.5-turbo	Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of `text-davinci-003`. Will be updated with our latest model iteration 2 weeks after it is released.	4,096 tokens	Up to Sep 2021
gpt-3.5-turbo-16k	Same capabilities as the standard `gpt-3.5-turbo` model but with 4 times the context.	16,384 tokens	Up to Sep 2021
gpt-3.5-turbo-0613	Snapshot of `gpt-3.5-turbo` from June 13th 2023 with function calling data. Unlike `gpt-3.5-turbo`, this model will not receive updates, and will be deprecated 3 months after a new version is released.	4,096 tokens	Up to Sep 2021
gpt-3.5-turbo-16k-0613	Snapshot of `gpt-3.5-turbo-16k` from June 13th 2023. Unlike `gpt-3.5-turbo-16k`, this model will not receive updates, and will be deprecated 3 months after a new version is released.	16,384 tokens	Up to Sep 2021
gpt-3.5-turbo-0301 (Legacy)	Snapshot of `gpt-3.5-turbo` from March 1st 2023. Unlike `gpt-3.5-turbo`, this model will not receive updates, and will be deprecated on June 13th 2024 at the earliest.	4,096 tokens	Up to Sep 2021

GPT-3 models:

LATEST MODEL	DESCRIPTION	MAX TOKENS	TRAINING DATA
text-davinci-003 (Legacy)	Can do any language task with better quality, longer output, and consistent instruction-following than the curie, babbage, or ada models. Also supports some additional features such as inserting text.	4,097 tokens	Up to Jun 2021
text-davinci-002 (Legacy)	Similar capabilities to `text-davinci-003` but trained with supervised fine-tuning instead of reinforcement learning.	4,097 tokens	Up to Jun 2021
text-curie-001	Very capable, faster and lower cost than Davinci.	2,049 tokens	Up to Oct 2019
text-babbage-001	Capable of straightforward tasks, very fast, and lower cost.	2,049 tokens	Up to Oct 2019
text-ada-001	Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost.	2,049 tokens	Up to Oct 2019
davinci	Most capable GPT-3 model. Can do any task the other models can do, often with higher quality.	2,049 tokens	Up to Oct 2019
curie	Very capable, but faster and lower cost than Davinci.	2,049 tokens	Up to Oct 2019
babbage	Capable of straightforward tasks, very fast, and lower cost.	2,049 tokens	Up to Oct 2019
ada	Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost.	2,049 tokens	Up to Oct 2019

GPT base models:

LATEST MODEL	DESCRIPTION	MAX TOKENS	TRAINING DATA
davinci-002	Replacement for the GPT-3 `curie` and `davinci` base models.	16,384 tokens	Up to Sep 2021
babbage-002	Replacement for the GPT-3 `ada` and `babbage` base models.	16,384 tokens	Up to Sep 2021

I have a summarization task with a long input text. Is there a workaround to handle longer input texts? — M.Hossein Rahimi, Mar 22 '23 at 03:02
@M.HosseinRahimi Hm, I've seen a few questions regarding this. The only easy option I see is breaking the text into smaller pieces. At the end, you would have, let's say, five summaries, and then you could do a final summarization. Is this an option? — Rok Benko, Mar 22 '23 at 09:10
If I choose the gpt-3.5-turbo-16k model and my total token usage for a request is 5k tokens, will 4k tokens be charged at the 4k rate and the remaining 1k be charged at the 16k rate? — thdoan, Jul 17 '23 at 07:09
@thdoan I think the answer is no. Pricing depends on the model used, not on the number of tokens spent. I know what you mean, but as far as I know, the answer is no. See [Pricing](https://openai.com/pricing). — Rok Benko, Jul 17 '23 at 08:13

score 1 · Answer 2 · answered Feb 09 '23 at 10:12

This was solved by Reddit user 'bortlip'.

The max_tokens parameter defines the response tokens.

From OpenAI:

https://platform.openai.com/docs/api-reference/completions/create#completions/create-max_tokens

The token count of your prompt plus max_tokens cannot exceed the model's context length.

Therefore to solve the issue I subtract the token count of the prompt from the max_tokens and it works just fine.

score 1 · Answer 3 · answered Apr 26 '23 at 13:54

An important note for gpt-3.5-turbo and gpt-4 users, as per documentation:

ChatGPT models like gpt-3.5-turbo and gpt-4 use tokens in the same way as older completions models, but because of their message-based formatting, it's more difficult to count how many tokens will be used by a conversation.

Please, refer to OpenAI CookBook for examples of how to deal with this if you're receiving this error because of wrongly calculated tokens. Also, the official docs with an example.

OpenAI GPT-3 API error: "This model's maximum context length is 4097 tokens"

3 Answers3

Linked

Related