0

I want to parse/process a 25 MB JSON file using Typescript and filter out/sort the objects .. The code I wrote is taking minutes (and sometimes timeouts) not sure why is this happening or if there's another way to make the code more efficient.

Note: the code worked on a small file

import fs from 'fs';
searchAccounts(): Promise<Account[]> {
       const accountSearchCriteria: AccountSearchCriteria = {
                country: 'NZ',
                mfa: 'SMS',
                name: 'TEST',
                sortField: 'dob'
        };
        const jsonPath = './src/file.json';
        const rawAccounts = fs.readFileSync(jsonPath, 'utf-8');
        let accounts: Account[] = JSON.parse(rawAccounts);
        if (accountSearchCriteria) {
            if (accountSearchCriteria.name) {
                accounts = accounts.filter(
                    account =>
                        account.firstName.toLowerCase() ===
                            accountSearchCriteria.name.toLowerCase() ||
                        account.lastName.toLowerCase() ===
                            accountSearchCriteria.name.toLowerCase()
                );
            }
            if (accountSearchCriteria.country) {
                accounts = accounts.filter(
                    account =>
                        account.country.toLowerCase() ===
                        accountSearchCriteria.country.toLowerCase()
                );
            }
            if (accountSearchCriteria.mfa) {
                accounts = accounts.filter(
                    account => account.mfa === accountSearchCriteria.mfa
                );
            }
            if (accountSearchCriteria.sortField) {
                accounts.sort((a, b) => {
                    return a[accountSearchCriteria.sortField] <
                        b[accountSearchCriteria.sortField]
                        ? -1
                        : 1;
                });
            }
            return accounts;
        }
        return accounts;
}
basel.ai
  • 157
  • 2
  • 15
  • 1
    Does this answer your question? [How to parse JSON string in Typescript](https://stackoverflow.com/questions/38688822/how-to-parse-json-string-in-typescript) – Derek Lawrence Sep 09 '21 at 00:49
  • No .. it doesn't .. I want to parse a big file using streams @DerekLawrence – basel.ai Sep 09 '21 at 00:50
  • 1
    fs.readFileSync() read the full content of the file in memory before returning the data. This means that big files are going to have a major impact on your memory consumption and speed of execution of the program. In this case, a better option is to read the file content using streams. Check this link, https://medium.com/@dalaidunc/fs-readfile-vs-streams-to-read-text-files-in-node-js-5dd0710c80ea – Nonik Sep 09 '21 at 01:05
  • https://github.com/dominictarr/JSONStream – Derek Lawrence Sep 09 '21 at 01:07
  • If you remove your sort does it work fine every time? – vaira Sep 09 '21 at 01:30

2 Answers2

0

Since your data size is 25 MB, you should use a more memory-efficient sorting algorithm.

You can try to use cycle sort.

cycle-sort you can find an implementation here and use it in your code to see if there is a difference.

vaira
  • 2,191
  • 1
  • 10
  • 15
0

Node.js is single-threaded if your code blocking the thread for a long time it will give you a timeout error. There are two problems with your code.

  1. you are using fs.readFileSync(jsonPath, 'utf-8');, it is an asynchronous function and blocks the thread while reading the file. Use instead fs.readFile('./Index.html', callback):
const fs = require('fs');
fs.readFile('./Index.html', function read(err, data) {
   if (err) {
       throw err;
   }
   console.log(data);
});
  1. Sorting data is also a thread-blocking task try a different sorting technique, which doesn't occupy thread for a long time.

Note: Node.js is not good with CPU-centric tasks i.e sorting, image processing, etc. It's good with I/O tasks.

Shakir Aqeel
  • 252
  • 1
  • 13