-2

I have a blob storage account where I dropped a single file. Then I want to add a record on pinecone based on this file using langchain:

@app.get("/BlobStorage")
def IndexContainer(storageContainer: str, indexName: str, namespace_name: str):
    logging.info('Python HTTP trigger function IndexContainer processed a request.')

    try:
        pinecone.init(
            api_key=os.getenv("pineconeapikey"),
            environment=os.getenv("pineconeenvironment")
        )

        connect_str = os.getenv('blobstorageconnectionstring')
        loader = AzureBlobStorageContainerLoader(conn_str=connect_str, container=storageContainer)
        embeddings = OpenAIEmbeddings(deployment=os.getenv("openai_embedding_deployment_name"),
                                        model=os.getenv("openai_embedding_model_name"),
                                        chunk_size=1)

        openai.api_type = "azure"
        openai.api_version =os.getenv("openai_api_version")
        openai.api_base =os.getenv("openai_api_base")
        openai.api_key =os.getenv("openai_api_key")

        documents = loader.load()
        texts = []
        metadatas = []

        for doc in documents:
            texts.append(doc.page_content)
            metadatas.append(doc.metadata['source'])

        docsearch = Pinecone.from_texts(
            texts,
            embeddings,
            index_name=indexName,
            metadatas=metadatas,
            namespace=namespace_name
        )


        return {
            "message":f"File indexed. This HTTP triggered function executed successfully."
        }
    except Exception as e:
        error_message = f"An error occurred: {str(e)}"
        logging.exception(error_message)
        return {
            "message":f"Error : {error_message}."
        }

I debugged this code line by line and all variables are setup correctly and the exception is only thrown until the from_texts method.

However I get this error:

An error occurred: 'str' object does not support item assignment.

loader.load is below:

[Document(page_content="long content", metadata={'source': 'C:\\Users\\xx\\AppData\\Local\\Temp\\tmpk5cqh4nd/abc/filenameAI sorting_Project Description.docx'})]

Stack trace

Enter image description here

What am I missing?

I based my code on the unit tests of langchain:

https://github.com/hwchase17/langchain/blob/e27ba9d92bd2cc4ac9ed7439becb2d32816fc89c/tests/integration_tests/vectorstores/test_pinecone.py#L118

If it helps, the Blob Storage Container loader from Langchain is implemented here:

https://github.com/hwchase17/langchain/blob/e27ba9d92bd2cc4ac9ed7439becb2d32816fc89c/langchain/document_loaders/azure_blob_storage_container.py

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Luis Valencia
  • 32,619
  • 93
  • 286
  • 506
  • 1
    Can you add the full stacktrace? What is `loader`? What does `loader.load()` return? – C.Nivs Jul 05 '23 at 13:42
  • Out of curiosity did you managed to resolve the [related question](https://stackoverflow.com/q/76605471/6699447)? – Abdul Niyas P M Jul 05 '23 at 13:43
  • Please show whole traceback - you logged it with logging.exception. With that we'll actually know what line is wrong – h4z3 Jul 05 '23 at 13:45
  • You still have not provided the full traceback! – Matteo Zanoni Jul 05 '23 at 13:48
  • I added a screenshot of the watch window in vscode for variable e, – Luis Valencia Jul 05 '23 at 13:48
  • That is not what we need... The traceback should be printed by `logging.exception` – Matteo Zanoni Jul 05 '23 at 13:50
  • not sure how to get it? – Luis Valencia Jul 05 '23 at 13:52
  • my code is based from: https://github.com/hwchase17/langchain/blob/e27ba9d92bd2cc4ac9ed7439becb2d32816fc89c/tests/integration_tests/vectorstores/test_pinecone.py#L118 – Luis Valencia Jul 05 '23 at 13:53
  • If it helps, the Blob Storage Container loader from Langchain is implemented here: https://github.com/hwchase17/langchain/blob/e27ba9d92bd2cc4ac9ed7439becb2d32816fc89c/langchain/document_loaders/azure_blob_storage_container.py – Luis Valencia Jul 05 '23 at 14:04
  • Please review *[Why not upload images of code/errors when asking a question?](https://meta.stackoverflow.com/questions/285551/)* (e.g., *"Images should only be used to illustrate problems that* ***can't be made clear in any other way,*** *such as to provide screenshots of a user interface."*) and [do the right thing](https://stackoverflow.com/posts/76620938/edit). Thanks in advance. – Peter Mortensen Jul 06 '23 at 22:16

2 Answers2

0

Your problem is int the metadata variable. Looking at the source code it seems that metadata needs to be a list of dictionaries. You are passing a list of strings instead.

Just fix it by:

        for doc in documents:
            texts.append(doc.page_content)
            metadatas.append(doc.metadata)
Matteo Zanoni
  • 3,429
  • 9
  • 27
  • that was my first attempt before posting the question here, however the error is different: '{"code":3,"message":"metadata size is 140376 bytes, which exceeds the limit of 40960 bytes per vector","details":[]}' – Luis Valencia Jul 05 '23 at 14:03
  • If it helps, the Blob Storage Container loader from Langchain is implemented here: https://github.com/hwchase17/langchain/blob/e27ba9d92bd2cc4ac9ed7439becb2d32816fc89c/langchain/document_loaders/azure_blob_storage_container.py – Luis Valencia Jul 05 '23 at 14:04
-1

The error 'str' object does not support item assignment typically occurs when you try to modify a string object, which is immutable in Python. Looking at your code, the issue seems to be with the assignment of metadatas in the for loop.

The documents list contains Document objects with a metadata attribute, but in your code, you're trying to access the metadata using doc.metadata['source']. However, since metadata is a string object, you cannot access its elements like a dictionary.

To resolve this, you need to ensure that metadata is in a dictionary format. If metadata is a JSON string, you can parse it using json.loads() to convert it into a dictionary. Here's an example of how you can modify your code:

import json

# ...

documents = loader.load()
texts = []
metadatas = []

for doc in documents:
    texts.append(doc.page_content)
    metadata_dict = json.loads(doc.metadata)  # Parse the 

metadata JSON string into a dictionary
        metadatas.append(metadata_dict.get('source', ''))  # Access the 'source' key from the metadata dictionary
    
    # ...

By parsing the metadata string into a dictionary, you can then access the 'source' key correctly. Make sure the metadata value is in a valid JSON format for this to work.

With this modification, you should be able to resolve the error and proceed with your code.

  • this error is shown when trying your code on jsonloads: metadatas.append(metadata_dict.get('source', '')) # Access the 'source' key from the metadata dictionary – Luis Valencia Jul 05 '23 at 13:52
  • if it helps, I based my code from here: https://github.com/hwchase17/langchain/blob/e27ba9d92bd2cc4ac9ed7439becb2d32816fc89c/tests/integration_tests/vectorstores/test_pinecone.py#L118 – Luis Valencia Jul 05 '23 at 13:53
  • *"you're trying to access the metadata using doc.metadata['source']"* If this was the problem, the error message would be `TypeError: string indices must be integers`, but that's not the error that OP sees. – slothrop Jul 05 '23 at 13:57
  • If it helps, the Blob Storage Container loader from Langchain is implemented here: https://github.com/hwchase17/langchain/blob/e27ba9d92bd2cc4ac9ed7439becb2d32816fc89c/langchain/document_loaders/azure_blob_storage_container.py – Luis Valencia Jul 05 '23 at 14:04
  • 1
    This answer looks like it was generated by an AI (like ChatGPT), not by an actual human being. You should be aware that [posting AI-generated output is officially **BANNED** on Stack Overflow](https://meta.stackoverflow.com/q/421831). If this answer was indeed generated by an AI, then I strongly suggest you delete it before you get yourself into even bigger trouble: **WE TAKE PLAGIARISM SERIOUSLY HERE.** Please read: [Why posting GPT and ChatGPT generated answers is not currently acceptable](https://stackoverflow.com/help/gpt-policy). – tchrist Jul 06 '23 at 00:33
  • 1
    **Readers should review this answer carefully and critically, as AI-generated information often contains fundamental errors and misinformation.** If you observe quality issues and/or have reason to believe that this answer was generated by AI, please leave feedback accordingly. – NotTheDr01ds Jul 06 '23 at 00:45