0

I try to follow great advices from here but apparently I'm doing something incorrect. I need loop/map through listOfUrls from file and push search result to resutls array but only last result is saved. My code:

var page = require('webpage').create();
var fs = require('fs');

var fileContent = fs.read('list.txt');
var listOfUrls = fileContent.split('\n');
var results = [];

function SearchPage(url,callback) {
  url = 'http://' + url;
  page.open(url, function (status) {
    var content = page.content;
    var found = content.indexOf('body');
    if (found !== -1) {
    var result = '>>>Found: ' + url;
    }
    else {
    var result = 'Not found: '+ url;
    }
    callback(result);


  });
}
listOfUrls.map(function(elem){

    SearchPage(elem,function(result){         
      results.push(result);
      console.log(results);  // only last result is in the array  

    });

});
Community
  • 1
  • 1
  • Have you understood the issue in my answer on the linked duplicate question? – Artjom B. Jun 14 '16 at 17:22
  • Not exactly. I used "page.close" inside "page.open" but it did not work and gave up. The second thing: I do not have anywhere "phantom.exit ()" and yet the code works. I do not understand why. @Artjom B. – Tehawanka Jun 15 '16 at 13:11
  • It cannot work here as described in my answer, because the loop is executed immediately and you cannot open multiple pages on the same `page` instance. That's why you need to create multiple instances and then close them. A `page` is essentially a tab in a desktop browser. How do you think it is possible for a tab to show multiple pages at the same time? About the `phantom.exit()`, if you don't have it, then the process just doesn't terminate and you would have to terminate it yourself. If you're automating stuff, then you don't want to terminate a process yourself. – Artjom B. Jun 15 '16 at 13:15
  • Ok, I did it recursive way as you suggested and it works. Thanks a lot! I still has a lot to learn about the async js... – Tehawanka Jun 15 '16 at 13:36
  • Btw, I processed your answer again and managed to do it non-recursively: [link](https://codeshare.io/tNr6c) @Artjom B. – Tehawanka Jun 15 '16 at 15:22
  • `page.close()` is important, because those pages that are created in the loop are not going away on their own. They have to be closed explicitly. If you don't then you're running into memory problems. Did you have a problem with that? – Artjom B. Jun 15 '16 at 15:27
  • actually, I forgot, now added and also works. I honestly do not know how to check the issue of memory (on the Windows localhost) – Tehawanka Jun 15 '16 at 15:46
  • You could add many more URLs to the array, comment out the `checkFinished` call and `page.close()`, and look into the task manager. If you try it with multiple sizes of the URLs array, then you should see how this impacts the memory footprint. – Artjom B. Jun 15 '16 at 15:51

0 Answers0