3

I've searched all over langchain documentation on their official website but I didn't find how to create a langchain doc from a str variable in python so I searched in their GitHub code and I found this :

  doc=Document(
                page_content="text",
                metadata={"source": "local"}
            )

PS: I added the metadata attribute
then I tried using that doc with my chain:
Memory and Chain:

memory = ConversationBufferMemory(memory_key="chat_history", input_key="human_input")
chain = load_qa_chain(
    llm, chain_type="stuff", memory=memory, prompt=prompt
)

the call method:

  chain({"input_documents": doc, "human_input": query})

prompt template:

template = """You are a senior financial analyst analyzing the below document and having a conversation with a human.
{context}
{chat_history}
Human: {human_input}
senior financial analyst:"""

prompt = PromptTemplate(
    input_variables=["chat_history", "human_input", "context"], template=template
)

but I am getting the following error:

AttributeError: 'tuple' object has no attribute 'page_content'

when I tried to check the type and the page content of the Document object before using it with the chain I got this

print(type(doc))
<class 'langchain.schema.Document'>
print(doc.page_content)
"text"


Mohamed Amine
  • 340
  • 1
  • 4
  • 16

4 Answers4

13

This worked for me:

from langchain.docstore.document import Document

doc =  Document(page_content="text", metadata={"source": "local"})

  • It seems that importing from `langchain.schema.document` also works, and that latter module is the one specified in the [API documentation](https://api.python.langchain.com/en/latest/schema/langchain.schema.document.Document.html#langchain.schema.document.Document). – Sergii Volchkov Aug 20 '23 at 18:44
0

this is the best that I could come with

def str_to_doc(text,name):
   folder_name = 'docs'
   if not os.path.exists(folder_name):
       os.makedirs(folder_name)
   file_name = name+'.txt'
   path = os.path.join(folder_name, file_name)
   with open(path, "w") as file:
        file.write(text)
   loader = TextLoader(path)
   return loader.load()


Mohamed Amine
  • 340
  • 1
  • 4
  • 16
0

First, some context. From what I've learned so far, a Document is a list of Document objects. If you run type(doc[0]) you get langchain.schema.document.Document. This Document object is a dictionary made of two keys: one is page_content: which accepts string values, and the second key is metadata: which only accepts dictionaries. {page_content: str, metadata: dict}. By default (don't quote me on this: it's been lots of trial and error and, as you mentioned, there is no documentation), an "empty" Document contains the two mentioned keys, and a single dictionary in its metadata: with one key: {source:} that only accepts strings. You can create a multiple "page" Document object by creating a list of Document objects like so:

First, you must have a list of string texts: text_list below, and a list of dictionaries for the metadata: text_list below. You must ensure both lists are the same length.

from langchain.docstore.document import Document

document =  []

for item in range(len(text_string)):
    page = Document(page_content=doc_text_splits[item],
    metadata = metadata_string[item])
    doc.append(page)

Additionally, you can also create Document object using any splitter from LangChain:

from langchain.text_splitter import CharacterTextSplitter

doc_creator = CharacterTextSplitter(parameters)

document = doc_creator.create_documents(texts = text_list, metadatas = metadata_list)
starball
  • 20,030
  • 7
  • 43
  • 238
-1

Try the below code snippet,

from langchain.schema.document import Document
doc =  Document(page_content="text", metadata={"source": "local"})
Codemaker2015
  • 12,190
  • 6
  • 97
  • 81