0

tl;dr Need some help with Promises.

Here's a little scraper function at the core of it all:

function request(method, url) {
    return new Promise(function (resolve, reject) {
        var xhr = new XMLHttpRequest();
        xhr.open(method, url);
        xhr.onload = resolve;
        xhr.onerror = reject;
        xhr.send();
    });
}

I also have a fairly large list of profiles I'd like to resolve:

const profiles = ['http://www.somesite.com/profile/1',
'http://www.somesite.com/profile/2'] // 100+ more

How do I process them in batches of, say, 5 at a time?

Here's my thought process so far:

  1. Split into chunks of N using _.chunk()
  2. Await Promise.all() resolving said chunk

Here's what I have so far:

async function processProfileBatch(batchOfUrls) {
    let promises = [];

    // Populate promises
    for (let i = 0; i < batchOfUrls.length; i++) {
        let url = batchOfUrls[i]
        promises.push(request('GET', url))
    }

    // Wait for .all to resolve
    return await Promise.all(promises)
}
const profileBatches = _.chunk(profileLinks, 3)
for (let i = 0; i < profileBatches.length; i++) {
        let processedBatch = await processProfileBatch(profileBatches[i])
        console.log(new Date(), 'processedBatch', processedBatch);
    }

Unfortunately this just returns ProgressEvents; upon inspection the xhr contained within has .responseText set to "" even though readyState is 4:

enter image description here

dsp_099
  • 5,801
  • 17
  • 72
  • 128
  • 1
    You could use [fetch](https://www.npmjs.com/package/node-fetch) instead of XMLHttpRequst – HMR Jun 02 '18 at 08:56
  • @HRM Can't use fetch - target site is sensitive to cookies/sessions or something. – dsp_099 Jun 02 '18 at 09:00
  • Getting `ProgressEvent` is normal for listening to `onload`, you can do `xhr.onload = () => resolve(xhr)` or something to get a different result value – Bergi Jun 02 '18 at 09:03
  • If the `responseText` is empty, that most likely sounds like an [SOP issue](https://stackoverflow.com/q/3076414/1048572) – Bergi Jun 02 '18 at 09:04
  • @Bergi Negative; if I add a sleep(1000) into the Promise.all() to make it take a bit longer, all the xhr promises resolve. Without adding it, sometimes they resolve and other times they do not. It feels like I can't run multiple XHRs at the same time though that doesn't make any sense... – dsp_099 Jun 02 '18 at 09:09
  • @dsp_099 What's the site that you're loading, maybe it rate-limits requests? Also what's the `statusCode` of the responses? – Bergi Jun 02 '18 at 09:17
  • 1
    did you try fetch with credentials include? `fetch(url, { method: 'GET', credentials: 'include' })` – HMR Jun 02 '18 at 09:26
  • @HMR is it possible that Chrome console would show empty string when in fact there was a large string there? I just ran 10 concurrent xhrs and fetches and they all returned expected values. – dsp_099 Jun 02 '18 at 09:39

1 Answers1

1

The issue with chunking is that you have x active requests and then wait for all of them to finish to start the next x amount of requests.

With throttle you could continuously have x amount of requests active until all are done.

//lib comes from: https://github.com/amsterdamharu/lib/blob/master/src/index.js
const lib = require("lib");

function processProfileBatch(batchOfUrls){
  const max10 = lib.throttle(10)
  return Promise.all(
    batchOfUrls.map(
      url=>
        max10(url=>request('GET', url))(url)
    )
  )
};

If you'd rather limit connections per period (say 2 per second) instead of limiting active connections you could use throttlePeriod:

twoPerSecond = lib.throttlePeriod(2,1000);
... other code
twoPerSecond(url=>request('GET', url))(url)

You may also want to not throw away all resolved requests when one rejects. Using a special resolve value for rejected requests you can separate the rejected from the resolved requests:

//lib comes from: https://github.com/amsterdamharu/lib/blob/master/src/index.js
const lib = require("lib");

function processProfileBatch(batchOfUrls){
  const max10 = lib.throttle(10)
  return Promise.all(
    batchOfUrls.map(
      url=>
        max10(url=>request('GET', url))(url)
        .catch(err=>new lib.Fail([err,url]))
    )
  )
};

processProfileBatch(urls)
.then(//this does not reject because rejects are caught and return Fail object
  result=>{
    const successes = results.filter(lib.isNotFail);
    const failed = results.filter(lib.isFail);
  }
)
HMR
  • 37,593
  • 24
  • 91
  • 160
  • This is neat, lemme ruminate for a second. I noticed however that if I just simply wait a little bit longer all of the requests resolve just fine. In other words, it looks like `Promise.all` resolves faster than it should for some inexplicable reason. – dsp_099 Jun 02 '18 at 09:01