I'd like to fetch a webpage and save the content as a string? Is there a library to do that? I want to use the string for some a program I am building. It's for websites, that don't necessarily provide rss feed.
Asked
Active
Viewed 1,242 times
0
-
3[Apache HttpClient](http://hc.apache.org/) – Luiggi Mendoza Jul 03 '13 at 15:59
-
3I can't possibly you believe you didn't find one. I don't believed you even began to search. [First Google result of 'java fetch webpage'](http://stackoverflow.com/questions/238547/how-do-you-programmatically-download-a-webpage-in-java) – Anti Earth Jul 03 '13 at 16:00
-
@user2516730 you should flag the question as duplicate. – Luiggi Mendoza Jul 03 '13 at 16:02
-
Probably HtmlUnit might help you. – user902691 Jul 03 '13 at 16:05
-
@antiearth thanks, maybe the keyword i used to search was not accurate to the problem I was having. – Jul 03 '13 at 16:05
-
@LuiggiMendoza It would be nice if you provide an example. Really. – giannis christofakis Jul 03 '13 at 16:17
-
@yannishristofakis if you access to the links I've provided, there are lot of examples to accomplish the task asked by OP. Really. – Luiggi Mendoza Jul 03 '13 at 16:37
-
@LuiggiMendoza Ok,thnx. – giannis christofakis Jul 03 '13 at 16:38
3 Answers
3
i think you need this
URL url = new URL("http://www.google.com/");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = null; // con.getContentEncoding(); *** WRONG: should use "con.getContentType()" instead but it returns something like "text/html; charset=UTF-8" so this value must be parsed to extract the actual encoding
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
System.out.println(body);
-
Be careful, `con.getContentType()` should be used instead of `con.getContentEncoding()`, but it returns something like `"text/html; charset=UTF-8"` so this value must be parsed in order to extract the actual encoding (I've added a comment on the code above to reflect this) – xav Aug 21 '16 at 05:25
-
See http://stackoverflow.com/questions/5938007/what-is-the-difference-between-content-type-charset-x-and-content-encoding-x concerning my previous comment (`con.getContentEncoding()` is used for things like "gzip", "compress", ... not encoding) – xav Dec 21 '16 at 16:12
0
You can use Apache HttpComponents
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpget = new HttpGet("http://www.google.gr");
try (CloseableHttpResponse response = httpclient.execute(httpget)) {
HttpEntity entity = response.getEntity();
if (entity != null) {
System.out.println(EntityUtils.toString(entity));
}
response.close();
} catch (IOException ex) {
Logger.getLogger(HttpClient.class.getName()).log(Level.SEVERE, null, ex);
}

giannis christofakis
- 8,201
- 4
- 54
- 65
-
Hello. Do you know if this is slower or faster than the accepted answer? – dentex May 28 '14 at 18:47
-
-
@dentex I don't think you gain to much in performance. Don't forget you want to access a remote resource,so it's not up to you how fast your result will come,if speed is what concerns you. `Apache HttpComponents` gives you much more functionality like asynchronous calls. It's up to you. – giannis christofakis May 30 '14 at 13:46