I have an API endpoint which I reverse-engineered. I use it to search for a name and it returns no more than 100 entities at one request. But there's about 1.3M+ of these entities that I want to fetch.
Here's a sample of an entity from response:
{
"name":"COMPANY NAME",
"regNo":"100-H",
"newRegNo":"191101000018",
"type":"Company"
}
I can search by either name
or regNo
. there's no minimum character limit for searching. I thought of search by alphabetically but since it returns no more than 100 entities at once i cannot fetch the rest. So, I tried to fetch it by regNo
. regNo
can be from 1 up to 1000000.
here's the script that I wrote to fetch all entities by their regNo
:
const test = async () => {
const data = {};
try {
const requests = [];
// since it returns no more than 100 entities at once it adds 100
// to the search query on every loop
for (let i = 100; i < 10000; i += 100) {
requests.push(fetchData(i));
}
const result = await Promise.all(requests);
result.forEach(res => {
res.entityList.forEach(entity => {
data[entity.regNo] = entity;
});
});
// You can ignore this part
fs.writeFile("data.json", JSON.stringify(data), err => {
console.log(err);
});
console.log(Object.keys(data).length);
} catch (err) {
console.log(err);
}
};
It took about 15 seconds to fetch 9100 entities ( made 100 loops )
And every regNo
has one letter suffix like this 11000-H
If I fetch 100 it would return something like this:
entityList: [
{
name: "COMPANY NAME",
regNo: '100-H',
newRegNo: '191101000018',
type: 'Company'
},
{
name: "COMPANY NAME",
regNo: '1000-V',
newRegNo: '193901000021',
type: 'Company'
},
{
name: "COMPANY NAME",
regNo: '10000-T',
newRegNo: '197001000604',
type: 'Company'
},
{
name: "COMPANY NAME",
regNo: '100000-D',
newRegNo: '198301004377',
type: 'Company'
},
{
name: "COMPANY NAME",
regNo: '1000001-W',
newRegNo: '201001012078',
type: 'Company'
},
{
name: "COMPANY NAME",
regNo: '1000002-K',
newRegNo: null,
type: 'Company'
},
{
name: "COMPANY NAME",
regNo: '1000003-U',
newRegNo: '201001012079',
type: 'Company'
},
{
name: "COMPANY NAME",
regNo: '1000004-V',
newRegNo: '201001012080',
type: 'Company'
},
{
name: "COMPANY NAME",
regNo: '1000005-D',
newRegNo: '201001012081',
type: 'Company'
},
{
name: "COMPANY NAME",
regNo: '1000006-A',
newRegNo: '201001012082',
type: 'Company'
},
.......
As you can see it does not return entities from 0 to 99. I am assuming that the highest regNo
is 1000000-suffixLetter
and if I can fetch from 100 to 1000000 in a loop I would fetch about 1M entities. BUT here's the trick regNo
has a suffix letter. Let's suppose that if I fetch 100 it returs from 100-A
to 199-A
. But there's other entities like 100-B
, 100-C
, etc
how can I fetch 1.3M+ entities efficiently without loss of data ?