3

Writing some additional classes for an existing GWT project. I need to:

  • Request a URL
  • Read in the webpage returned, in order to perform operations on.

The returned page is in very simple HTML, therefore parsing it shouldn't be very difficult, I just need to get the data first.

How do I do this in Java? What packages am I best looking at?

Bozho
  • 588,226
  • 146
  • 1,060
  • 1,140
Federer
  • 33,677
  • 39
  • 93
  • 121

4 Answers4

8

With native Java API, the easiest way to read from an URL is using java.net.URL#openStream(). Here's a basic example:

try (InputStream response = new URL("https://www.stackoverflow.com").openStream()) {
    String body = new String(input.readAllBytes(), StandardCharsets.UTF_8);
    System.out.println(body);
}

You could feed the InputStream to any DOM/SAX parser of your taste. The average parser can take (in)directly an InputStream as argument or even a URL. Jsoup is one of the better HTML parsers.

In case you want a bit more control and/or want a more self-documenting API, then you can since Java 11 use the java.net.http.HttpClient. It only gets verbose quickly when you merely want the response body:

HttpClient client = HttpClient.newBuilder().build();
HttpRequest request = HttpRequest.newBuilder().GET().uri(URI.create("https://stackoverflow.com")).build();
HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
String body = response.body();
System.out.println(body);

See also:

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
1

For HTML pages you should use HttpClient.

For Web services, you need a framework like CXF.

kgiannakakis
  • 103,016
  • 27
  • 158
  • 194
0

HttpClient, although very good, is considered obsolete. HttpComponents is an alternative.

Bozho
  • 588,226
  • 146
  • 1,060
  • 1,140
0

If you want to do something like this on the client, take a look at the HTTP types of GWT. But be aware that you are subject to the same-origin policy then.

wilth
  • 705
  • 2
  • 8
  • 19