4

I am using selenium with PhantomJs to scrape the URL. I initialized the driver as below

final DesiredCapabilities caps = DesiredCapabilities.chrome();
caps.setCapability(
        PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY,
        "PhantomJsPath");
caps.setCapability("page.settings.loadImages", false);
caps.setCapability("trustAllSSLCertificates", true);

RemoteWebDriver driver = new PhantomJSDriver(caps);
driver.setLogLevel(Level.OFF);
driver.get("https://.......")

The pagesource obtained from the driver is empty

Am I missing anything?

Artjom B.
  • 61,146
  • 24
  • 125
  • 222
Babu
  • 165
  • 3
  • 12
  • Selenium is a poor choice for web scraping. Have you looked at something like curl or httpbuilder? – SiKing Oct 27 '14 at 22:00
  • I need to submit the form and do Javascript changes . so I preferred Selenium. Could you find to work it to scrape https url. I think the url i am gonna scrape has some unknown certificates and thus it is not scraping. We have to set parameter to ignore SSL params. I could not get the correct params – Babu Oct 28 '14 at 03:24

1 Answers1

8

Recently the POODLE vulnerability forced websites to remove SSLv3 support. Since PhantomJS < v1.9.8 uses SSLv3 by default, the page cannot be loaded. To fix this, you would need to run PhantomJS with --ssl-protocol=tlsv1 or --ssl-protocol=any. See this answer for plain PhantomJS.

caps = DesiredCapabilities.phantomjs(); // or new DesiredCapabilities();
caps.setCapability(PhantomJSDriverService.PHANTOMJS_CLI_ARGS, 
        new String[] {"--ssl-protocol=tlsv1"});
// other capabilities
driver = new PhantomJSDriver(caps);

If this doesn't solve the issue, you can also add

"--web-security=false", "--ignore-ssl-errors=true"

to the String array of cli args as seen in SiKing's answer here.

Community
  • 1
  • 1
Artjom B.
  • 61,146
  • 24
  • 125
  • 222