0

Im trying to scrape about 1400 pages but selenium throws socketexception randomly at about 1000th pages. I've tried Chrome, Firefox and PhantomJS but none of them have worked. PhantomJS can't even handle the website properly eventhough I've set javascriptenabled property to true but thats another case.

Here is the log:

Ara 07, 2016 4:09:09 PM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.SocketException) caught when processing request to {}->http://localhost:1384: Permission denied: connect
Ara 07, 2016 4:09:09 PM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {}->http://localhost:1384
Exception in thread "main" org.openqa.selenium.WebDriverException: java.net.SocketException: Permission denied: connect
Build info: version: 'unknown', revision: '1969d75', time: '2016-10-18 09:43:45 -0700'
System info: host: 'DESKTOP-OA9G2Q7', ip: '192.168.1.7', os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '1.8.0_101'
Driver info: driver.version: RemoteWebDriver
    at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:91)
    at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:601)
    at org.openqa.selenium.remote.RemoteWebElement.execute(RemoteWebElement.java:274)
    at org.openqa.selenium.remote.RemoteWebElement.getAttribute(RemoteWebElement.java:126)
    at com.aliren.sp.scraping.Scraper.getRatioComparisonData(Scraper.java:227)
    at com.aliren.sp.scraping.Scraper.start(Scraper.java:136)
    at com.aliren.sp.scraping.Scraper.main(Scraper.java:104)
Caused by: java.net.SocketException: Permission denied: connect
    at java.net.DualStackPlainSocketImpl.connect0(Native Method)
    at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:83)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:589)
    at org.apache.http.conn.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:74)
    at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:141)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
    at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:71)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
    at org.openqa.selenium.remote.internal.ApacheHttpClient.fallBackExecute(ApacheHttpClient.java:142)
    at org.openqa.selenium.remote.internal.ApacheHttpClient.execute(ApacheHttpClient.java:88)
    at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:160)
    at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:82)
    ... 6 more
Nerzid
  • 457
  • 5
  • 15
  • It's kind of sporadic, I faced similar issue once. I could be something like a failsafe which blocks repeated requests. Adding wait time between two requests kind of helped. In the other hand, try if this helps http://stackoverflow.com/a/7478027/5212566 – Prageeth Saravanan Dec 07 '16 at 14:39
  • Well, there is already a `Thread.sleep()` between url changes. Im not sure how can the link that you post help me. Could you explain it ? – Nerzid Dec 07 '16 at 18:40
  • Apologies, I shared the wrong thread. I was trying to share http://stackoverflow.com/a/27949629/5212566 This user has a similar issue but from a different context. The thread talks about getting "around" it by simply catching the exception and retrying. In your case you can try and catch socket exception and retry, since it's random. It not a fix, but guess can be a workaround. – Prageeth Saravanan Dec 07 '16 at 19:01

1 Answers1

0

Use the -Djava.net.preferIPv4Stack=true JVM System Property to help enable support for IPv4. It resolves my problem as my network is supporting IPV6.

Brandon Minnick
  • 13,342
  • 15
  • 65
  • 123