0

I am new in node. I have written a code using Node and Phantom to scrape a website. My code is working for google.com but not working for facebook because it is internally making an ajax request to other files to get the data.

var phantom = require('phantom');

phantom.create(function(ph) {
   return ph.createPage(function(page) {
       return page.open("https://facebook.com/", function(status) {
            if(status !== 'success'){
                console.log('Unable to load the url!');
                ph.exit();
            } else {
                setTimeout(function() {
                    return page.evaluate(function() {
                        return document.getElementsByTagName('body')[0].innerHTML;

                     }, function(result) {
                         console.log(result); //Log out the data.
                         ph.exit();
                     });
                }, 5000);
            };
        });
    });
});

So basically when I am executing my code then in case of facebook it is returning unable to load but but in case of google it is giving body response.

Can anybody tell me what changes should I do to get the result?

PhantomJS version: 1.9.0

Artjom B.
  • 61,146
  • 24
  • 125
  • 222
Avoid
  • 343
  • 2
  • 9
  • 20
  • Sorry @ArtjomB. That was my fault. – Avoid Jan 29 '15 at 23:28
  • It is version 1.9.0. – Avoid Jan 29 '15 at 23:29
  • Actually I want to get the whole HTML that means after some times few js files are executed too and make requests to the server and then whole HTML is rendered. I want that HTML but in this case I am not getting that.. I am getting first rendered element and Js file for the second ones. Is it possible to make code in that way?? @Atjom B. – Avoid Jan 30 '15 at 00:05

1 Answers1

2

You should pass some commandline options to PhantomJS to not use SSLv3 but only TLSv1 and optionally ignore SSL errors (--web-security=false might also be helpful):

phantom.create('--ssl-protocol=tlsv1', '--ignore-ssl-errors=true', function(ph) {
    ...

The reason this might be an issue is that many sites have removed SSLv3 support because of the Poodle vulnerability.

This answer provides the solution for plain PhantomJS. My answer here elaborates on that issue in more detail for CasperJS.

Community
  • 1
  • 1
Artjom B.
  • 61,146
  • 24
  • 125
  • 222