1

I have a small code written in JavaScript that get the content of html pages then do a process on them (crawler). the problem is request causes asynchronous execution. I tried to use Promises and async & await but still got the same problem asynchronous execution , the reason is i want to crawl multiple pages at once in order to move to the next objective. Here is a similar code of what i have here :

const rootlink= 'https://jsonplaceholder.typicode.com/posts/';

async function f (){
    await f1()
    f3()
}

async function f1(){
    return new Promise(async (resolve,reject)=>{
        log('f1 start');
         for(let i=1;i<11;i++){
            await request(rootlink+i,(err, res,html)=>{
                if(!err && res.statusCode==200){
                    log('link '+i +' done');
                    resolve();
                }
                else reject()
            })
        }
    })
}

function f3(){
    console.log('f3')
}

f()

the result should be : f1 start link 1 done link 2 done link 3 done link 4 done link 5 done link 6 done link 7 done link 8 done link 9 done link 10 done f3

instead of f1 start link 1 done f3 link 2 done link 3 done link 4 done link 5 done link 6 done link 7 done link 8 done link 9 done link 10 done

Dez
  • 5,702
  • 8
  • 42
  • 51
  • 1
    It seems that what you want to achieve is not really asynchronous, you might want to perform a synchronous request instead – kant312 Oct 28 '19 at 17:35
  • yes that's what i really want – Imed Eddine BOUDRAA Oct 28 '19 at 17:37
  • `return new Promise(async (...` is a bit weird. Why not just `return (async (...) => {...})()`. Probably won't actually change anything, but removing unnecessary code & indentation is never bad for debugging. – CollinD Oct 28 '19 at 17:38
  • take a look at this: https://stackoverflow.com/questions/37576685/using-async-await-with-a-foreach-loop?rq=1 – Matt Aft Oct 28 '19 at 17:42

3 Answers3

1

NOTE: I would use an isomorphic fetch package like node-fetch to create code that could be used in multiple environments. Even if you don't plan to use this in a browser, becoming familiar with the API is very beneficial for future use. At very least, this idea allowed me to write a code snippet that you can actually run on StackOverflow.

Promise.all() is your answer no matter what package you use, though. You can simply wait for ALL the promises to resolve, then do your logic:

// const fetch = require('node-fetch')

const fetchData = (...args) => fetch(...args).then(r => {
  if (!r.ok) throw new Error('Error!')
  return r.json()
})

const getAllPostsAsync = (postIds) => Promise.all(
  postIds.map(postId => fetchData(`https://jsonplaceholder.typicode.com/posts/${postId}`))
)

;(async () => {
  const posts = await getAllPostsAsync([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
  
  // TODO: Your logic here, after waiting for all posts to load
  console.log(posts)
})()
BCDeWitt
  • 4,540
  • 2
  • 21
  • 34
0

Given that you want to do multiple asychronous operations in parallel, you actually don't want to await them, as this would block your function.

First, I would say it's better to find a HTTP library that uses promises. The one you use has callbacks, but I believe the request project also has a request-promise package that's much easier to use.

Here's a fixed version of your f1 function that uses promises more correctly. Note that this is not parallelized yet.

const request = require('request-promise');

async function f1(){
  log('f1 start');
  for(let i=1;i<11;i++){
     const res = await request(rootlink+i);
     if(res.statusCode==200){
        log('link '+i +' done');
     }
  }
}

Here is another version of this function, except now it's fully parallelized.

async function f1(){
  log('f1 start');
  const promises = [];
  for(let i=1;i<11;i++){
     promises.push(
       request(rootlink+i).then( (res) => {
         if(res.statusCode==200){
           log('link '+i +' done');
         }
       })
     );
  }

  await Promise.all(promises);
}

This can be made a bit more elegant if this is split up in multiple functions:

async function f1(){
  log('f1 start');
  const promises = [];
  for(let i=1;i<11;i++){
     promises.push(checkLink(i));
  }

  await Promise.all(promises);
}

async function checkLink(i) {
  const res = await request(rootlink+i);
  if (res.statusCode==200){
     log('link '+i +' done');
  }
}
Evert
  • 93,428
  • 18
  • 118
  • 189
  • Thank you Sir for the answer – Imed Eddine BOUDRAA Oct 28 '19 at 17:44
  • The problem is in my case i have to do it in a synchronous operations in order to store results later in a file. reality my second function (f2) crawl according to result given by the first function (f1). then there is another function that store results when the 2 previews functions finish doing they job. – Imed Eddine BOUDRAA Oct 28 '19 at 17:47
0

The answer of my question is to use an synchronous request 'sync-request' from https://www.npmjs.com/package/sync-request

  • This package warns "you should not be using this in a production application". I would advise against settling with `sync-request` as your solution – BCDeWitt Oct 28 '19 at 18:06
  • Yes i saw that warning. Its my only solution for the moment till i figure out a better solution ( i ll think of another way to do this job with avoiding synchronous operations) – Imed Eddine BOUDRAA Oct 28 '19 at 18:13