LlamaIndex with ChatGPT taking too long to retrieve answers

Question

I am currently working on a chatbot for our website that provides domain knowledge using LlamaIndex and chatGPT. Our chatbot uses around 50 documents, each around 1-2 pages long, containing tutorials and other information from our site. While the answers I'm getting are great, the performance is slow. On average, it takes around 15-20 seconds to retrieve an answer, which is not practical for our website.

I have tried using Optimizers, as suggested in the documentation, but haven't seen much improvement. Currently, I am using GPTSimpleVectorIndex and haven't tested other indexes yet. I have tried running the bot on different machines and haven't seen a significant improvement in performance, so I don't think it's a hardware limitation.

I am looking for suggestions on how to improve the performance of the bot so that it can provide answers more quickly.

Thank you!

Code:

import os
import sys
import streamlit as st
from llama_index import (LLMPredictor, GPTSimpleVectorIndex, 
                         SimpleDirectoryReader, PromptHelper, ServiceContext)
from langchain import OpenAI

os.environ["OPENAI_API_KEY"] = ...
retrain = sys.argv[1]
doc_path = 'docs'
index_file = 'index.json'
st.title("Chatbot")

def ask_ai():
    st.session_state.response  = index.query(st.session_state.prompt)

if retrain:
    documents = SimpleDirectoryReader(doc_path).load_data()
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens = 128))
    num_output = 256
    max_chunk_overlap = 20
    max_input_size = 4096
    prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
    index = GPTSimpleVectorIndex.from_documents(
        documents, service_context=service_context
    )
    index.save_to_disk(index_file)

if 'response' not in st.session_state:
    st.session_state.response = ''

elif os.path.exists(index_file):
    index = GPTSimpleVectorIndex.load_from_disk(index_file)

if index != None:
    st.text_input("Ask something: ", key='prompt')
    st.button("Send", on_click=ask_ai)
    if st.session_state.response:
        st.subheader("Response: ")
        st.success(st.session_state.response)

Did you write the chatbot? If so, include the code as a [mcve] in your question. Use a profiler to find where it spends its time. If you are just using someone else's software, your question isn't about programming and hence off-topic here. See [help/on-topic] and [ask]. — Robert, Apr 25 '23 at 14:43
Thanks for your answer @Robert. I have updated my question to include a reproducible example. My question is related to the performance of the llama-index package, as I am experiencing long response times and would like to confirm that I am using the package correctly. I saw some similar questions and I thought it was ok to ask but please, let me know if this is not the right place. — Aggamarcel, Apr 25 '23 at 16:10

score 0 · Answer 1 · answered Jul 17 '23 at 00:11

Streamlit is stateless by default. This means that if you set your retrain argument, it would run the entire thing, including document loading and indexing every single interaction. If you want it to only reindex on startup, you need to add retrain arg to st.session_state, and then set it to false at the end of your retrain function.

LlamaIndex with ChatGPT taking too long to retrieve answers

1 Answers1