I am building an app around GPT-3, and I would like to know how much tokens every request I make uses. Is this possible and how ?
-
The past tense of the question makes it sound like you're asking for the tokens _after_ a request is made. I'm guessing that's not what's being asked, but if anyone comes across this Q&A looking for the tokens after running a request, it's in the JSON response, in the `usage` object: https://beta.openai.com/docs/api-reference/completions – Chris Hayes Jan 08 '23 at 18:06
6 Answers
Counting Tokens with Actual Tokenizer
To do this in python, first install the transformers package to enable the GPT-2 Tokenizer, which is the same tokenizer used for [GPT-3]:
pip install transformers
Then, to tokenize the string "Hello world", you have a choice of using GPT2TokenizerFast or GPT2Tokenizer.
from transformers import GPT2TokenizerFast\
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")\
number_of_tokens = len(tokenizer("Hello world")['input_ids'])
or
from transformers import GPT2Tokenizer\
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")\
number_of_tokens = len(tokenizer("Hello world")['input_ids'])
In either case, tokenizer() produces a python list of token representing the string, which can the be counted with len(). The documentation doesn't mention any differences in behavior between the two methods. I tested both methods on both text and code and they gave the same numbers. The from_pretrained methods are unpleasantly slow: 28s for GPT2Tokenizer, and 56s for GPT2TokenizerFast. The load time dominates the experience, so I suggest NOT using the "fast" method. (Note: the first time you run either of the from_pretrained methods, a 3MB model will be downloaded and installed, which takes a couple minutes.)
Approximating Token Counts
The tokenizers are slow and heavy, but approximations can be made to go back and forth between them, using nothing but the number of characters or tokens. I developed the following approximations by observing the behavior of the GPT-2 tokenizer. They hold well for English text and python code. The 3rd and 4th functions are perhaps the most useful since they let us quickly fit a text in the GPT-3's token limit.
import math
def nchars_to_ntokens_approx(nchars):
#returns an estimate of #tokens corresponding to #characters nchars
return max(0,int((nchars - 2)*math.exp(-1)))
def ntokens_to_nchars_approx(ntokens):
#returns an estimate of #characters corresponding to #tokens ntokens
return max(0,int(ntokens*math.exp(1) ) + 2 )
def nchars_leq_ntokens_approx(maxTokens):
#returns a number of characters very likely to correspond <= maxTokens
sqrt_margin = 0.5
lin_margin = 1.010175047 #= e - 1.001 - sqrt(1 - sqrt_margin) #ensures return 1 when maxTokens=1
return max( 0, int(maxTokens*math.exp(1) - lin_margin - math.sqrt(max(0,maxTokens - sqrt_margin) ) ))
def truncate_text_to_maxTokens_approx(text, maxTokens):
#returns a truncation of text to make it (likely) fit within a token limit
#So the output string is very likely to have <= maxTokens, no guarantees though.
char_index = min( len(text), nchars_leq_ntokens_approx(maxTokens) )
return text[:char_index]

- 742
- 8
- 19
-
3It's pretty fast to me, almost instantaneous. I don't know why you got 56s. – off99555 Jan 08 '23 at 17:43
-
Its making some network calls, so it depends on your network speed. When I sit further from my wifi antenna it takes even longer. – Schroeder Jan 09 '23 at 08:29
-
-
Yes, 56 seconds; as in almost a minute. It’s interesting that it runs fast for you. I wonder what’s going on. – Schroeder Jan 10 '23 at 15:49
-
I did time it and got 3.74 ms per call on a text with 2000 tokens using GPT2TokenizerFast. Specifically my text is `"hello world" * 1000`. This doesn't require internet access because the model is already downloaded. Maybe you don't have a GPU so it's very slow. But I don't see GPU usage going up on my laptop when running the code either. Not sure what's going on. It doesn't make sense that a tokenizer will be that slow. – off99555 Jan 10 '23 at 16:07
-
I'm running on a machine with an Nvidia RTX A2000 GPU. The super slow part for me is the line tokenizer = GPT2TokenizerFast.from_pretrained("gpt2"), so has nothing to do with the prompt. – Schroeder Jan 11 '23 at 08:37
-
It still only takes 2.5 second to load the tokenizer for me. The tokenizer is already downloaded. – off99555 Jan 12 '23 at 11:14
-
My best guess for the cause of this timing difference is that I'm running into an incompatibility between tensorflow (only able to handle Cuda <= v11.2) and my Cuda12 installation, and this is preventing the use of AVX2 FMA instructions. Loading the tokenizer throws warnings about this. – Schroeder Jan 17 '23 at 14:03
-
-
OPEN-AI charges GPT-3 usage through tokens, this counts both the prompt and the answer. For OPEN-AI 750 words would have an equivalent of around 1000 tokens or a token to word ratio of 1.4 . Pricing of the token depends of the plan you are on.
I do not know of more accurate ways of estimating cost. Perhaps using GPT-2 tokenizer from Hugging face can help. I know the tokens from the GPT-2 tokenizer are accepted when passed to GPT-3 in the logit bias array, so there is a degree of equivalence between GPT-2 tokens and GPT-3 tokens.
However GPT-2 and GPT-3 models are different and GPT-3 famously has more parameters than GPT-3 so GPT-2 estimations are probably lower token wise. I am sure you can write a simple program that estimates the price by comparing prompts and token usage, but that might take some time.

- 76
- 4
Here is an example from openai-cookbook that worked perfectly for me:
import tiktoken
def num_tokens_from_string(string: str, encoding_name: str) -> int:
"""Returns the number of tokens in a text string."""
encoding = tiktoken.get_encoding(encoding_name)
num_tokens = len(encoding.encode(string))
return num_tokens
num_tokens_from_string("tiktoken is great!", "gpt2")
>6

- 31
- 1
Code to count how much tokens a GPT-3 request used:
def count_tokens(input: str):
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
res = tokenizer(input)['input_ids']
return len(res)
print(count_tokens("Hello world"))

- 2,493
- 11
- 56
- 97
-
4Keep the tokenizer initialization outside the function (e.g. in `__init__`) to make this run much faster. – Fábio Perez Nov 18 '22 at 17:51
Here is how I do it with Python 3. Then you can pass the model name or the encoding string. You can get the encoding, the tokens or the token count.
token_helper.py:
import tiktoken
def encoding_getter(encoding_type: str):
"""
Returns the appropriate encoding based on the given encoding type (either an encoding string or a model name).
"""
if "k_base" in encoding_type:
return tiktoken.get_encoding(encoding_type)
else:
return tiktoken.encoding_for_model(encoding_type)
def tokenizer(string: str, encoding_type: str) -> list:
"""
Returns the tokens in a text string using the specified encoding.
"""
encoding = encoding_getter(encoding_type)
tokens = encoding.encode(string)
return tokens
def token_counter(string: str, encoding_type: str) -> int:
"""
Returns the number of tokens in a text string using the specified encoding.
"""
num_tokens = len(tokenizer(string, encoding_type))
return num_tokens
Works like this
>>> import token_helper
>>> token_helper.token_counter("This string will be counted as tokens", "gpt-3.5-turbo"))
7

- 2,526
- 2
- 11
- 30
For C# users you can refer this git repo https://github.com/betalgo/openai You can take the tokenizing elements (Tokenizer/GPT3) from the repo and create a helper in your codebase. (Note* Ive used this tokenizer and is pretty much accurate but it isn't recommended for production use)

- 61
- 3