How to use retriever in Langchain?

Question

I looked through lot of documentation but got confused on the retriever part.

So I am building a chatbot using user's custom data.

User will feed the data
Data should be upserted to Pinecone
Then later user can chat with their data
there can be multiple users and each user will be able to chat with their own data.

Now I am following below approach

Storing user data into Pinecone

def doc_preprocessing(content):
    doc = Document(page_content=content)
    text_splitter = CharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=0
    )
    docs_split = text_splitter.split_documents([doc])
    return docs_split

def embedding_db(user_id, content):
    docs_split = doc_preprocessing(content)
    # Extract text from the split documents
    texts = [doc.page_content for doc in docs_split]
    vectors = embeddings.embed_documents(texts)

    # Store vectors with user_id as metadata
    for i, vector in enumerate(vectors):
        upsert_response = index.upsert(
            vectors=[
                {
                    'id': f"{user_id}",
                    'values': vector,
                    'metadata': {"user_id": str(user_id)}
                }
            ]
        )

This way it should create embeddings for the given data into pinecone.

Now the second part is to chat with this data. For QA, I have below

def retrieval_answer(user_id, query):
    text_field = "text"
    vectorstore = Pinecone(
        index, embeddings.embed_query, text_field
    )

    vectorstore.similarity_search(
        query,
        k=10,
        filter={
            "user_id": str(user_id)
        },
    )

    qa = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type='stuff',
        retriever=vectorstore.as_retriever(),
    )
    result = qa.run(query)
    print("Result:", result)
    return result

but I keep getting

Found document with no `text` key. Skipping.

When i am doing QA, its not referring to the data stored in pinecone. Its just using the normal chatgpt. I am not sure what i am missing here.

need to debug step by step. as a quick check could be done by `retriever.get_relevant_documents(query)` and see if this gets some data. — simpleApp, Aug 17 '23 at 03:03
this should help you resolve the issue https://github.com/langchain-ai/langchain/issues/3460 — ZKS, Aug 19 '23 at 17:37
Here's something that might assist you: consider exploring this implementation using LangChain - you can find it at [PrivateDocBot](https://github.com/Abhi5h3k/PrivateDocBot) — Abhi, Aug 27 '23 at 15:31

How to use retriever in Langchain?

0 Answers0