4

I am trying to load a folder of JSON files in Langchain as:

loader = DirectoryLoader(r'C:...')
documents = loader.load()

But I got such an error message:

ValueError: Json schema does not match the Unstructured schema

Can anyone tell me how to solve this problem?

I tried using glob='**/*.json', but it is not working. The documentation on the Langchain website is limited as well.

Zeeshan Hassan Memon
  • 8,105
  • 4
  • 43
  • 57
peiwb
  • 41
  • 1
  • 2

2 Answers2

7

If you want to read the whole file, you can use loader_cls params:

from langchain.document_loaders import DirectoryLoader, TextLoader

loader = DirectoryLoader(DRIVE_FOLDER, glob='**/*.json', show_progress=True, loader_cls=TextLoader)

also, you can use JSONLoader with schema params like:

from langchain.document_loaders.json_loader import JSONLoader

DRIVE_FOLDER = "/content/drive/MyDrive/Colab Notebooks/demo"

loader = DirectoryLoader(DRIVE_FOLDER, glob='**/*.json', show_progress=True, loader_cls=JSONLoader, loader_kwargs = {'jq_schema':'.content'})

documents = loader.load()

print(f'document count: {len(documents)}')
print(documents[0] if len(documents) > 0 else None)

jq_schema You can follow this: https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/json_loader.py#L10

more usage for DirectoryLoader: https://github.com/hwchase17/langchain/blob/master/langchain/document_loaders/directory.py

Zeeshan Hassan Memon
  • 8,105
  • 4
  • 43
  • 57
BuffK
  • 1,189
  • 14
  • 17
-1

You can use the DirectoryLoader class to load a folder of JSON files in Langchain. This class takes a path to the folder as input and returns a list of Document objects.

import langchain

from langchain.docstore.document import Document
from langchain.document_loaders.fs import DirectoryLoader

folder_path = "/path/to/json/files"
directory_loader = DirectoryLoader(folder_path)
documents = directory_loader.load()

for document in documents:
    print(document.page_content)
Codemaker2015
  • 12,190
  • 6
  • 97
  • 81