3

I have a page with, say, 30 URLS, I need to click on each and check if an element exists. Currently, this means:

$('area').each(function(){
    $(this).attr('target','_blank');
    var _href = $(this).attr("href"); 
    var appID = (window.location.href).split('?')[1];
    $(this).attr("href", _href + '?' + appID);
    $(this).trigger('click');
});

Which opens 30 new tabs, and I manually go through them.

(all URLS are in the same domain)

It would be really nice have a crawler with the following logic:

$('area').each(function(){

 1) get the HREF
 2) follow it
 3) on that new page:
    if($('.element')){
     push the $('area') into array1 
    } else {
     push the $('area') into array2
        }
    });


   4) Display array1 in green
      Display array2 in red

Basically, Id like to generate a report that says:

X crawled pages had element Y

Z crawled pages did not have element Y

Im obviously stuck at making Javascript/jQuery work in a newly opened tab.

I've found this , this and this , but Im not entirely sure if this is viable.

Can this be done with Javascript/jQuery?

I'm only asking for the right direction, I'll do the steps myself.

Many thanks

Community
  • 1
  • 1
Andrejs
  • 10,803
  • 4
  • 43
  • 48

1 Answers1

1

I can suggest you use iframe for loading pages.

For example:

$.each($your-links, function(index, link) {
    var href = $(link).attr("href");
    // your link preprocess logic ...

    var $iframe = $("<iframe />").appendTo($("body"));
    $iframe.attr("src", href).on("load", function() {
        var $bodyContent = $iframe.contents().find("body");
        // check iframe content and remove iframe
        $iframe.remove();
    }
}

But, I should say, if your crawler and checked pages have a different domains there will be the CORS problems.

I've created a simple project that shows how to implement this approach. You can download it here and run on some local web-server (apache, iis etc.)

Sergey
  • 5,396
  • 3
  • 26
  • 38