2

I would like to count the tokens of my OpenAI API request in R before sending it (version gpt-3.5-turbo). Since the OpenAI API has rate limits, this seems important to me.

Example:

The function I use to send requests:

ask_chatgpt <- function(prompt) {
      response <- POST(
        url = "https://api.openai.com/v1/chat/completions", 
        add_headers(Authorization = paste("Bearer", api_key)),
        content_type_json(),
        encode = "json",
        body = list(
          model = "gpt-3.5-turbo",
          messages = list(list(
            role = "user", 
            content = prompt
          ))
        )
      )
      str_trim(content(response)$choices[[1]]$message$content)
    }

Example


api_key <- "your_openai_api_key" 

library(httr)
library(tidyverse)

#Calls the ChatGPT API with the given prompt and returns the answer
ask_chatgpt <- function(prompt) {
  response <- POST(
    url = "https://api.openai.com/v1/chat/completions", 
    add_headers(Authorization = paste("Bearer", api_key)),
    content_type_json(),
    encode = "json",
    body = list(
      model = "gpt-3.5-turbo",
      messages = list(list(
        role = "user", 
        content = prompt
      ))
    )
  )
  str_trim(content(response)$choices[[1]]$message$content)
}

prompt <- "how do I count the token in R for gpt-3.45-turbo?"

ask_chatgpt(prompt)
#> [1] "As an AI language model, I am not sure what you mean by \"count the token in R for gpt-3.5-turbo.\" Please provide more context or clarification so that I can better understand your question and provide an appropriate answer."

Created on 2023-04-24 with reprex v2.0.2

I would like to calculate/estimate as how many tokens prompt will need with gtp-3.5-turbo

There is a similar question for gtp-3 and python, where the tiktoken library is recommended. However, I could not find a similar library in R.

OpenAI also recommends tiktoken or gpt-3-encoder package for JavaScript.

captcoma
  • 1,768
  • 13
  • 29

2 Answers2

1

OpenAI has their own tokenizer so you probably won't be able to reproduce it. Instead, I would just recommend using their python API via the reticulate package

First, install the tiktoken package via the command line using:

pip install tiktoken

Then, in R

library(reticulate)
tiktoken <- import("tiktoken")
encoding <- tiktoken$encoding_for_model("gpt-3.5-turbo")
prompt <- "how do I count the token in R for gpt-3.45-turbo?"
length(encoding$encode(prompt))
# [1] 19
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
1

Here is how I do it, with these helpers you can get the encoding, tokens and token count based on either model name or encoding string.

# token_helper.R
library(reticulate)
tiktoken <- import("tiktoken")

encoding_getter <- function(encoding_type) {
  if (grepl("k_base", encoding_type)) {
    tiktoken$get_encoding(encoding_type)
  } else {
    tiktoken$encoding_for_model(encoding_type)
  }
}

tokenizer <- function(string, encoding_type) {
  encoding <- encoding_getter(encoding_type)
  tokens <- encoding$encode(string)
  tokens
}

token_counter <- function(string, encoding_type) {
  tokens <- tokenizer(string, encoding_type)
  num_tokens <- length(tokens)
  num_tokens
}

To count tokens, you can call the token_counter function as shown below:

library(reticulate)
token_helper <- import("token_helper")
token_helper$token_counter("This string will be counted as tokens", "gpt-3.5-turbo")
Timothy Alexis Vass
  • 2,526
  • 2
  • 11
  • 30