1

I have the following code:

chat_history = []
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(chunks, embeddings)
qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0.1), db.as_retriever())
result = qa({"question": "What is stack overflow", "chat_history": chat_history})

The code creates embeddings, creates a FAISS in-memory vector db with some text that I have in chunks array, then it creates a ConversationalRetrievalChain, followed by asking a question.

Based on what I understand from ConversationalRetrievalChain, when asked a question, it will first query the FAISS vector db, then, if it can't find anything matching, it will go to OpenAI to answer that question. (is my understanding correct?)

How can I detect if it actually called OpenAI to get the answer or it was able to get it from the in-memory vector DB? The result object contains question, chat_history and answer properties and nothing else.

AngryHacker
  • 59,598
  • 102
  • 325
  • 594

3 Answers3

2

You can detect if the answer was obtained from the in-memory vector database by checking if the "answer" property exists and is not empty in the result object. If it's present, the answer came from the database; otherwise, it was generated by the OpenAI model.

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 05 '23 at 05:14
1

I personaly don't think ConversationalRetrievalChain could get you any answer from document without sending api request to OpenAI in provided example. But I'm not expert with it, I could be wrong.

But you could use another cheaper/local llm as a way to condense final question to help optimize token count.

Here is their example:

qa = ConversationalRetrievalChain.from_llm(
    ChatOpenAI(temperature=0, model="gpt-4"),
    vectorstore.as_retriever(),
    condense_question_llm = ChatOpenAI(temperature=0, model='gpt-3.5-turbo'),
)

A way one could trace usage of API is as follows:

from langchain.callbacks import get_openai_callback
with get_openai_callback() as cb:
    result = llm("Tell me a joke")
    print(cb)

Tokens Used: 42 Prompt Tokens: 4 Completion Tokens: 38 Successful Requests: 1 Total Cost (USD): $0.00084

The other usefull way is to use additional tool to trace requests: https://github.com/amosjyng/langchain-visualizer

Idcore
  • 31
  • 3
1

Hi you can apply for https://smith.langchain.com/ to visual tracking of the ConversationalRetrievalChain

See the image:enter image description here

Here I'm using AzureChatOpenAI. The first call of the LLMChain is for "Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

The second call is for your specific prompt or the langchain default prompt.

In addition you can set verbose=True on ConversationalRetrievalChain.from_llm to see the what is happening.

Hope it helps. Regards.

DSgUY
  • 101
  • 6