1

I'm writing a node script to pull status codes from URLs with the https module. The script has an array of URLs. I'm using a forEach loop to iterate over them and make the https.get() calls.

The issue I'm seeing is that the https calls don't appear to happen (or, at least, don't trigger their callback function) until the entire forEach loop has completed as which point they all appear to run.

Here's the code I'm working with:

const https = require('https')

const urls = [
  'https://www.example.com/',
  'https://www.example.org/',
  'https://www.example.net/',
  'https://www.example.com/',
  'https://www.example.org/',
  'https://www.example.net/',
  'https://www.example.com/',
  'https://www.example.com/',
  'https://www.example.com/',
  'https://www.example.com/',
  'https://www.example.org/',
  'https://www.example.net/',
]

function processResponse(response) {
  console.log(`${Date.now()} - response.statusCode`)
}

urls.forEach((url) => {
  console.log(`Getting: ${Date.now()} - ${url}`)
  https.get(url, (response) => {
    processResponse(response)
  })
})

The output is consistently something like:

Getting: 1637980537771 - https://www.example.com/
Getting: 1637980537810 - https://www.example.org/
Getting: 1637980537811 - https://www.example.net/
Getting: 1637980537813 - https://www.example.com/
Getting: 1637980537816 - https://www.example.org/
Getting: 1637980537818 - https://www.example.net/
Getting: 1637980537820 - https://www.example.com/
Getting: 1637980537821 - https://www.example.com/
Getting: 1637980537823 - https://www.example.com/
Getting: 1637980537827 - https://www.example.com/
Getting: 1637980537830 - https://www.example.org/
Getting: 1637980537832 - https://www.example.net/
1637980537989 - 200
1637980537992 - 200
1637980537995 - 200
1637980537997 - 200
1637980538005 - 200
1637980538006 - 200
1637980538022 - 200
1637980538023 - 200
1637980538024 - 200
1637980538026 - 200
1637980538032 - 200
1637980538035 - 200

The timestamp of the status codes is always after the last item of the forEach loop. That's what I don't understand.

With several thousand urls and the likelyhood of broken ones, I want to make sure that each one is processed individually so I don't lose progress if things go sideways.

I'm open to using other modules/approaches, but I'd also like to know what's happening here since it doesn't comport with my mental model.

Alan W. Smith
  • 24,647
  • 4
  • 70
  • 96

1 Answers1

2

Your https.get() is a non-blocking function. It doesn't wait for completion. It just starts the operation and immediately returns. So, your .forEach() loop just ends up starting all the operations and then sometime later, each one finishes and calls its callback.

If you want a loop where the loop waits for each asynchronous operation, then you should use a regular for loop and use an asynchronous operation that returns a promise. Then, you can use await on the promise-returning asynchronous operation and it will pause the loop.

For example, you could do this by using async/await and an http library that returns promises:

const got = require('got');

const urls = [
  'https://www.example.com/',
  'https://www.example.org/',
  'https://www.example.net/',
  'https://www.example.com/',
  'https://www.example.org/',
  'https://www.example.net/',
  'https://www.example.com/',
  'https://www.example.com/',
  'https://www.example.com/',
  'https://www.example.com/',
  'https://www.example.org/',
  'https://www.example.net/',
]

async function runAll() {
   for (let url of urls) {
      console.log(`Getting: ${Date.now()} - ${url}`)
       try {
           let result = await got(url);
           console.log(`${Date.now()} - ${result.statusCode}`)
        } catch(e) {
            console.log(`${Date.now()} - ${e.message} - ${e.response.statusCode}`);
        }

   }
}

runAll().then(() => {
    console.log("all done");
}).catch(err => {
    console.log(err);
});

I'm open to using other modules/approaches, but I'd also like to know what's happening here since it doesn't comport with my mental model.

https.get() is non-blocking. It starts the operation and then immediately returns. This allows your loop to just keep on going without waiting for completion. This is how non-blocking asynchronous operations typically work in Javascript.

jfriend00
  • 683,504
  • 96
  • 985
  • 979