0

I'd like to fetch a webpage and save the content as a string? Is there a library to do that? I want to use the string for some a program I am building. It's for websites, that don't necessarily provide rss feed.

giannis christofakis
  • 8,201
  • 4
  • 54
  • 65

3 Answers3

3

i think you need this

URL url = new URL("http://www.google.com/");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = null; // con.getContentEncoding(); *** WRONG: should use "con.getContentType()" instead but it returns something like "text/html; charset=UTF-8" so this value must be parsed to extract the actual encoding
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
System.out.println(body);
xav
  • 5,452
  • 7
  • 48
  • 57
M Sach
  • 33,416
  • 76
  • 221
  • 314
  • Be careful, `con.getContentType()` should be used instead of `con.getContentEncoding()`, but it returns something like `"text/html; charset=UTF-8"` so this value must be parsed in order to extract the actual encoding (I've added a comment on the code above to reflect this) – xav Aug 21 '16 at 05:25
  • See http://stackoverflow.com/questions/5938007/what-is-the-difference-between-content-type-charset-x-and-content-encoding-x concerning my previous comment (`con.getContentEncoding()` is used for things like "gzip", "compress", ... not encoding) – xav Dec 21 '16 at 16:12
1

May I suggest JSoup ?

Document doc = Jsoup.connect("www.google.com").get();
merours
  • 4,076
  • 7
  • 37
  • 69
0

You can use Apache HttpComponents

    CloseableHttpClient httpclient = HttpClients.createDefault();
    HttpGet httpget = new HttpGet("http://www.google.gr");
    try (CloseableHttpResponse response = httpclient.execute(httpget)) { 
        HttpEntity entity = response.getEntity();
        if (entity != null) {
           System.out.println(EntityUtils.toString(entity));
        }
        response.close();
    } catch (IOException ex) {
        Logger.getLogger(HttpClient.class.getName()).log(Level.SEVERE, null, ex);
    }
giannis christofakis
  • 8,201
  • 4
  • 54
  • 65