0

I need to find the HTTP response code of URLs in java. I know this can be done using URL & HTTPURLConnection API and have gone through previous questions like this and this.

I need to do this on around 2000 links so speed is the most required attribute and among those I already have crawled 150-250 pages using crawler4j and don't know a way to get code from this library (due to which I will have to make connection on those links again with another library to find the response code).

Community
  • 1
  • 1
  • 1
    Have you tried to write any code of your own? If so, include what you have. If not, what's preventing you from doing so? – Anthony Grist Jun 26 '12 at 14:22
  • Answering to your ques I have tried to write code by my own . Surely I 'll share it with you. And things which are preventing from doing so: 1. I am new to java , so I don't know much about its libraries . 2. I don't know much about how to find the answers . I mean i try with google , previous ques, discuss with other and if i couldn't find I ask with people like you.(may be u can tell me how to find solution in problems which u get stuck , i mean you can give your example what you do in such situations And I may learn from that.) –  Jun 26 '12 at 14:50
  • From your comment it sounds as if you've taken all the steps I would have myself. It just helps if you include this information in your question so we know where to begin answering. If you already have code, you may be nearly there already and a small change would fix the issue - if you post that, we can tell you where you've made the mistakes. And, if we know you're new to Java, we may go into more detail about concepts that we'd gloss over when answering questions for more experienced Java programmers. That doesn't **all** apply in this case, but it might when you're asking other questions. – Anthony Grist Jun 26 '12 at 14:57

2 Answers2

3

In Crawler4J, the class WebCrawler has a method handlePageStatusCode, which is exactly what you are looking for and what you would also have found if you had looked for it. Override it and be happy.

kutschkem
  • 7,826
  • 3
  • 21
  • 56
2

The answer behind your first link contains everything you need: How to get HTTP response code for a URL in Java?

    URL url = new URL("http://google.com");
    HttpURLConnection connection = (HttpURLConnection)url.openConnection();
    connection.setRequestMethod("GET");
    connection.connect();

    int code = connection.getResponseCode();

The response code is the HTTP code returned by the server.

Community
  • 1
  • 1
Robert
  • 39,162
  • 17
  • 99
  • 152
  • Thank you for your reply ,And as i mentioned I already I have gone through these links. What i want is which is much faster (may be i didn't mention it clearly earlier) than this thing and if there is way to get it from crawler4j. –  Jun 26 '12 at 14:45
  • Yes, you mentioned that you have gone through them but you did not write that you have gained anything from them. – Robert Jun 26 '12 at 15:46
  • so , is there any thing faster than that . some thing like if I made a connection for the HOST like "google.com" in this case then I don't have to make connection again for file in same domain like "google.com/xyz.jpg" ? –  Jun 27 '12 at 05:47
  • What you are looking for is the HTTP persistent connection feature. BTW: you don't write exactly what you want to achieve it is difficult to help you. – Robert Jun 27 '12 at 07:29