3

I am writing a Java program which uses Apache-HttpComponents to load a page and prints its HTML to the console; however, the program only prints part of the HTML before throwing this error: Exception in thread "main" java.net.SocketException: socket closed. The portion of the HTML displayed before the exception is exactly the same every time I run the program, and the error occurs in this simplified example with Google, Yahoo and Craigslist:

String USERAGENT = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.172 Safari/537.22";
DefaultHttpClient client = new DefaultHttpClient();
HttpGet get = new HttpGet("http://www.craigslist.org");
get.setHeader(HTTP.USER_AGENT,USERAGENT);
HttpResponse page = client.execute(get);
get.releaseConnection();
InputStream stream = page.getEntity().getContent();
try{
    BufferedReader br = new BufferedReader(new InputStreamReader(stream));
    String line = "";
    while ((line = br.readLine()) != null){
        System.out.println(line);
    }
}
finally{
    EntityUtils.consume(page.getEntity());
}
Maythe
  • 576
  • 4
  • 13

1 Answers1

6

I've found that get.releaseConnection(); should not be called until after I've finished reading the HTML. Calling it immediately after EntityUtils.consume(page.getEntity()); fixes the above code.

Maythe
  • 576
  • 4
  • 13
  • Well of course it shouldn't. Releasing the connection and then trying to read data from it never made sense. Too localized. – user207421 May 28 '13 at 23:52
  • In an earlier script I wrote, the delay in the connection actually closing after releaseConnection() was called was long enough that I was actually able to read an entire (albeit tiny) HTML file after having called it. That is what tricked me into thinking Entities saved their content locally. – Maythe Jun 04 '13 at 17:45