Limit tokens per minute in LangChain, using OpenAI-embeddings and Chroma vector store

Question

I am looking for a way to limit the tokens per minute when saving embeddings in a Chroma vector store. Here is my code:

[...]
# split the documents into chunks
text_splitter = CharacterTextSplitter(chunk_size=1500, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
# select which embeddings we want to use
embeddings = OpenAIEmbeddings()
# create the vectorestore to use as the index
db = Chroma.from_documents(texts, embeddings)
[...]

I receive the following error:

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-embedding-ada-002 in organization org-xxx on tokens per min. Limit: 1000000 / min. Current: 1 / min. Contact us through our help center at help.openai.com if you continue to have issues..

As the function .from_documents is provided by the langchain/chroma library, it can not be edited. I was wondering if any of you know a way how to limit the tokes per minute when storing many text chunks and embeddings in a vector store? I thought about creating multiple sets of text chunks and safe them set by set to the db, for example with the .persist function. However, this would overwrite the db every time, as far as I understood. I couldn't find a solution in the langchain or chroma documentation.

Thanks a lot for the help.

Regards

In my case, I just reduced `chunk_size=500` and the code went through. — j3ffyang, Aug 20 '23 at 09:33

score 1 · Answer 1 · answered Aug 12 '23 at 18:52

This happens because you can send a limited number of tokens to OpenAI.

The solution I found is to feed it to OpenAI slowly. I expected Chroma to have a rate limiter, but I could not find such thing. Below code did it for me

After you created your database

for splitted_document in all_splits:
  vectorstore.from_documents(documents=[splitted_document], embedding=OpenAIEmbeddings(), persist_directory=base_path)
  time.sleep(60)

Limit tokens per minute in LangChain, using OpenAI-embeddings and Chroma vector store

1 Answers1