0

I want to make some kind of crawling program. it's chrome extension.

In my program, one function is visit many webpage. This is my plan.

visit page, load page, crawling, visit next page....

I want my program is sequential. but, 'load page' is async. therfore, lower code is not work.

function load_and_download(checked_list) {
    for(var item of checked_list){
        visit_page(item);   //async function    
        crawl_page(item);
    }       
}

I found solution like this.

function load_and_download(checked_list, now_index) {
    visit_page(checked_list[now_index], function () {//load finish
        crawl_page(checked_list[now_index]);
        if (checked_list.length > now_index + 1)
            load_and_download(checked_list, now_index+1);
    })
}

upper code work for me. but, it is recursive function. if checked_list is very very long, upper code is safe? I know recursive code has stack problem. javascript is safe from that?

Redwings
  • 540
  • 2
  • 4
  • 12
  • The only way I know of for the language to automatically avoid stack overflow is "tail-call optimization", for which see: [Are functions in JavaScript tail-call optimized?](https://stackoverflow.com/questions/37224520/are-functions-in-javascript-tail-call-optimized) – IMSoP Apr 21 '21 at 15:24
  • JavaScript is not safe from stack overflows in a general way. This can be done without recursion in a variety of ways, e.g., a FIFO that's consumed until it's empty. It's a natural recursive problem, but anything that can be solved with recursion can be solved without it if it actually proves to be a problem. – Dave Newton Apr 21 '21 at 15:31
  • @IMSoP you mean, chrome extension can't avoid stack overflow? my code is dangerous? – Redwings Apr 21 '21 at 15:32
  • @DaveNewton How can i use FIFO with callback? I don't know how to do it. – Redwings Apr 21 '21 at 15:34
  • Could you have event listeners instead for the `window.onload` ? https://developer.mozilla.org/en-US/docs/Web/API/GlobalEventHandlers/onload – Pogrindis Apr 21 '21 at 15:40
  • 4
    _"but, it is recursive function"_ - No, it isn't. The callback from `visit_page()` calls `load_and_download()` but that will happen "outside" of `load_and_download()` (at least if `visit_page()` is also asynchronous which the callback suggests) – Andreas Apr 21 '21 at 15:42
  • @Pogrindis well, i think it's not problem for me... – Redwings Apr 21 '21 at 15:45
  • @Andreas really? it's not??? you mean, upper code is safe? – Redwings Apr 21 '21 at 15:46
  • "i can't this with for loop" - what does that mean? If that code works, what's the exact problem? – Nico Haase Apr 22 '21 at 09:55
  • @NicoHaase I edit question and add example code. – Redwings Apr 22 '21 at 13:25

1 Answers1

1

What you have here is not a recursive function, if visit_page calls back asynchronously. This pattern is something which I'd like to call pseudorecursion, as due to the async callback, every call to load_and_download will happen in a separate task, thus inside the callstack there will be only one call to that function (per task). Once the async action is scheduled in visit_page, the callstack unwinds again and the next task will be processed. Therefore although it looks like recursion, there's actually no recursion going on.

Here's a simplified example illustrating this:

function asynchronous(callback) {
  // Here an asynchronous action gets started and processed somewhere
  setTimeout(callback, 1000);
  // Execution continues synchronously and the callstack unwinds from here on
}

function pseudoRecursive() {
  asynchronous(function () {
    console.log("As we're in a new task, the callstack is basically empty:\n", (new Error()).stack);
    pseudoRecursive();
  });
  // Here the callstack unwinds again
}

pseudoRecursive();
Jonas Wilms
  • 132,000
  • 20
  • 149
  • 151