1

I'm creating a JavaScript app that has a drop area where you can drop files from your drive. When the files are drop, I get an array of File objects. Now I want to use langchain document loader to load these files and then split them into chunks. This is the function I have so far:

import { TextLoader } from 'langchain/document_loaders/fs/text'
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter'
import { Document } from 'langchain/document'

export async function IngestFiles (files) {
  if (files.length < 1) return

  console.log('files', files)

  const splitter = new RecursiveCharacterTextSplitter(
    { chunkSize: 100, chunkOverlap: 10 }
  )

  let documents = []
  files.forEach(async file => {
    const loader = new TextLoader(file)
    const doc = await loader.load()
    const docOutput = await splitter.splitDocuments([
      new Document({ pageContent: doc[0].pageContent })
    ])
    documents = documents.concat(docOutput)

    console.log('documents', documents)
  })

  console.log('result', documents)

  return documents
}

I have added some console.log lines to be able to see the intermediate steps:

enter image description here

As you can see, I added two small txt files, they are properly loaded and split into smaller Document objects, but then the final result (last copnsole.log) is empty. I've tried everything and all I can think now is that this is related to the async/await but I can't see the issue.

Any help is appreciated

Fran Casadome
  • 508
  • 4
  • 15

1 Answers1

2

I think this post answers your question: https://stackoverflow.com/a/70946414/9787476

As a suggested solution in the post, don't use forEach, but use a for-of loop.

Also is there a specific reason to use:

const docOutput = await splitter.splitDocuments([
      new Document({ pageContent: doc[0].pageContent })
    ])

instead of simply

const docOutput = await splitter.splitDocuments(doc)
Jordy
  • 176
  • 4
  • Thank you @Jordy, that definitely solved my problem. I guess I need to read more js documentation before jumping into coding. As for you question, the answer is: none whatsoever... It was just me trying different things not knowing the problem was the foreach – Fran Casadome Jun 16 '23 at 12:26