1

I crawled a batch of 15K urls and saved them as a html file. In the first iteration I got output for 10980. And In the second iteration it is decreased to 9700. And in the third iteration it is 11120.

So I checked the results(hints/exception written in catch block) written in a text file. Most of the urls failed due to

java.net.UnknownHostException

some urls failed in first and third iterations but saved successfully in second iteration.

I googled forums where in most cases the reason given is like

Thrown to indicate that the IP address of a host could not be determined.

My question is how it was crawled successfully in my second iteration?

Please provide some solutions to resolve UnknownHostException or to find the IP address of the host.

Note: The above iterations are done with Multithreads(300) using Executor Service.

I tried with single thread. Now there is no difference in the output count in various iterations.

I also used the option -Djava.net.preferIPv4Stack=true suggested in comments.

But still I am getting UnknownHostException.

Vanaja Jayaraman
  • 753
  • 3
  • 18
  • Try with -Djava.net.preferIPv4Stack=true source: http://restlet-discuss.1400322.n2.nabble.com/Apparently-random-UnknownHostException-while-attempting-POST-operation-td7578919.html – MGorgon Feb 20 '15 at 12:37
  • Had the same issue, found solution here: http://stackoverflow.com/questions/2906745/unknownhostexception-in-java-that-too-only-sometimes – poozmak Sep 30 '16 at 06:37

0 Answers0