0

I looked through lot of documentation but got confused on the retriever part.

So I am building a chatbot using user's custom data.

  1. User will feed the data
  2. Data should be upserted to Pinecone
  3. Then later user can chat with their data
  4. there can be multiple users and each user will be able to chat with their own data.

Now I am following below approach

  1. Storing user data into Pinecone
def doc_preprocessing(content):
    doc = Document(page_content=content)
    text_splitter = CharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=0
    )
    docs_split = text_splitter.split_documents([doc])
    return docs_split

def embedding_db(user_id, content):
    docs_split = doc_preprocessing(content)
    # Extract text from the split documents
    texts = [doc.page_content for doc in docs_split]
    vectors = embeddings.embed_documents(texts)

    # Store vectors with user_id as metadata
    for i, vector in enumerate(vectors):
        upsert_response = index.upsert(
            vectors=[
                {
                    'id': f"{user_id}",
                    'values': vector,
                    'metadata': {"user_id": str(user_id)}
                }
            ]
        )

This way it should create embeddings for the given data into pinecone.

Now the second part is to chat with this data. For QA, I have below

def retrieval_answer(user_id, query):
    text_field = "text"
    vectorstore = Pinecone(
        index, embeddings.embed_query, text_field
    )

    vectorstore.similarity_search(
        query,
        k=10,
        filter={
            "user_id": str(user_id)
        },
    )

    qa = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type='stuff',
        retriever=vectorstore.as_retriever(),
    )
    result = qa.run(query)
    print("Result:", result)
    return result

but I keep getting

Found document with no `text` key. Skipping.

When i am doing QA, its not referring to the data stored in pinecone. Its just using the normal chatgpt. I am not sure what i am missing here.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Manoj ahirwar
  • 1,062
  • 1
  • 10
  • 24
  • 1
    need to debug step by step. as a quick check could be done by `retriever.get_relevant_documents(query)` and see if this gets some data. – simpleApp Aug 17 '23 at 03:03
  • 1
    this should help you resolve the issue https://github.com/langchain-ai/langchain/issues/3460 – ZKS Aug 19 '23 at 17:37
  • 1
    Here's something that might assist you: consider exploring this implementation using LangChain - you can find it at [PrivateDocBot](https://github.com/Abhi5h3k/PrivateDocBot) – Abhi Aug 27 '23 at 15:31

0 Answers0