0

I have a large JSON Dataset A (180,000 records) containing user's complete records and another JSON Dataset B (which is a subset of A) containing only some user's unique ID and name (about 1,500 records). I need to get the complete records for the users in Dataset B from Dataset A.

Here is I've tried so far

let detailedSponsoreApplicants = [];
let j;
        for(j=0; j < allApplicants.length; j++){
            let a = allApplicants[j];

            let i;
            for(i=0; i < sponsoredApplicants.length;; i++){
                let s = sponsoredApplicants[i];
                if (s && s.number === a.applicationNumber) {
                    detailedSponsoreApplicants.push(a);
                }else{                
                    if(s){
                        logger.warn(`${s.number} not found in master list`);
                    }
                }
            }

        }

The problem with the above code is that at some point I get the error FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

So, how do I efficiently achieve the task without the errors.

EDIT - SAMPLE JSON

Dataset A
{
  "applicationNumber": "3434343"
  "firstName": "dcds",
  "otherNames": "sdcs",
  "surname": "sdcs"
  "phone": "dscd",
  .
  .
  .
  "stateOfOrigin": "dcsd"
}

Dataset B
{
    "number": "3434343",
    "fullName": "dcds sdcs sdcs"
}
Yemi Kudaisi
  • 163
  • 1
  • 7

2 Answers2

1

Try giving node more memory to work with:

node --max-old-space-size=1024 index.js #increase to 1gb
node --max-old-space-size=2048 index.js #increase to 2gb
node --max-old-space-size=3072 index.js #increase to 3gb
node --max-old-space-size=4096 index.js #increase to 4gb
node --max-old-space-size=5120 index.js #increase to 5gb
node --max-old-space-size=6144 index.js #increase to 6gb
node --max-old-space-size=7168 index.js #increase to 7gb
node --max-old-space-size=8192 index.js #increase to 8gb

Also, your script may take a long time to run. If you want to increase performance consider using Map or converting your large array into an object for fast look ups:

const obj = a.reduce((obj, current) => {
  obj[current.applicationNumber] = current;
  return obj;
}, {});

You can then look up full details in constant time:

const fullDetailsOfFirstObject = obj[B[0].number];
Andy Gaskell
  • 31,495
  • 6
  • 74
  • 83
1

Maybe not the most effective one but an approach that will work is:

1) Import Dataset A (the huge one) into a database. For example sqlite or a database that you are familiar with.

2) Add indexing for the field applicationNumber.

3) Query the database for each of the elements in Dataset B or try querying in bulk (selecting more than one at a time).

I've done this before for a similar use case and it worked but still, in your case, there might be better ways of doing it.

Goran Stoyanov
  • 2,311
  • 1
  • 21
  • 31