0

I have to fetch about 30 files with ES6, each of them consists of 100 MB lines of text.

I parse the text, line by line, counting some data points. The result is a small array like

[{"2014":34,"2015":34,"2016":34,"2017":34,"2018":12}]

I'm running into memory problems while parsing the files (Chrome simply crashes the debugger), probably because I am parsing them all with map:

return Promise.all(filenamesArray.map( /*fetch each file in filenamesArray */ )).
then(() => { /*parse them all */ })

I'm not posting the full code because I know it's wrong anyway. What I would like to do is

  1. Load a single file with fetch
  2. Parse its text with a result array such as above
  3. Return the result array and store it somewhere until every file has been parsed
  4. Give the js engine / gc enough time to clear the text from step 1 from memory
  5. Load the next file (continue with 1, but only after step 1-4 are finished!).

but I can't seem to find a solution for that. Could anyone show me an example? I don't care if its promises, callback functions, async/await...as long as each file is parsed completely before the next one is started.

EDIT 2020825

Sorry for my late update, I only came around fixing my problem now. While I appreciate Josh Linds answer, I realized that I still have a problem with the async nature of fetch which I apparently did not describe well enough: How do I deal with promises to make sure one file is finished and its memory may be released? I implemented Joshs solution with Promises.all, only to discover that this would still load all files first and then start processing them. Luckily, I found another SO question with almost the same problem:

Resolve promises one after another (i.e. in sequence)?

and so I learned about async functions. In order to use them with fetch, this question helped me:

How to use fetch with async/await?

So my final code looks like this:

//returns a promise resolving with an array of all processed files
loadAndCountFiles(filenamesArray) {

        async function readFiles(filenamesArray) {

            let resultArray = [];
            for (const filename of filenamesArray) {
                const response = await fetch(filename);
                const text = await response.text();
                //process the text and return a much smaller extract
                const yearCountObject = processText(text);
                resultArray.push({
                    filename: filename,
                    yearCountObject: yearCountObject
                });
                console.log("processed file " + filename);
            }
            return resultArray;
        }

        return new Promise(
            (resolve, reject) => {
                console.log("starting filecount...");
                readFiles(filenamesArray)
                    .then(resultArray => {
                        console.log("done: " + resultArray);
                        resolve(resultArray);
                    })
                    .catch((error) => {
                        reject(error);
                    })
                }
        );
}

Now every file is fetched and processed before the next.

user1840267
  • 408
  • 1
  • 5
  • 18
  • What if the code you have is almost correct? You want to throw all of that away and make volunteers do all of your work for you because in *might* be wrong? Also, there are many questions about each step of this; do you have questions about those solutions? – Heretic Monkey Jun 08 '20 at 15:05
  • Heretic monkey, my code has about 100 lines of code and does it wrong, I know this. It's loading all file contents first, then parsing it in one way, then loading the files again, parsing it in a different way. For hours I am trying to restructure that code with the steps 1-5 and failed to produce anything without errors. If you really want I will post my code, but I really think reading it would just be a waste of time for everyone. I think I just need an example for steps 1,3,5. I gladly post my final solution if I have figured it out. – user1840267 Jun 08 '20 at 15:27
  • I don't need a full solution, my main problem is that I don't understand how I can fetch and parse, mark it for gc, then fetch the next. So synchronously asynchronously so to speak :) – user1840267 Jun 08 '20 at 15:29

1 Answers1

1

Global variable:

dictionary = {};

In main:

fileNamesArray.forEach(fname => readFile(fname));

Functions:

const readFile = (fname) => {
  /* get file */.then(file => {
    /* parse file */
    addToDict(year); // year is a string. Call this when you find a year
  })
}

const addToDict = (key) => {
  if (dictionary[key]) dictionary[key]++;
  else dictionary[key] = 1;
}
Josh Lind
  • 126
  • 3