167

I'm trying to find Java's equivalent to Groovy's:

String content = "http://www.google.com".toURL().getText();

I want to read content from a URL into string. I don't want to pollute my code with buffered streams and loops for such a simple task. I looked into apache's HttpClient but I also don't see a one or two line implementation.

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
Pomponius
  • 4,559
  • 4
  • 18
  • 5
  • 7
    Why not just create a utility class that encapsulates all that "polluted" buffered streams and loops? You could also use that class to handle things like the socket closing before the stream completes and to handle I/O blocks over a slow connection. After all, this is OO - encapsulate the functionality and hide it from your main class. – Jonathan B Dec 01 '10 at 20:31
  • 1
    It cannot be done in one or two lines. – Thorbjørn Ravn Andersen Dec 01 '10 at 20:38
  • see ZhekaKozlov 3 line answer, tested and no external dependencies – StevenWernerCS Jul 13 '21 at 19:03

12 Answers12

150

Now that some time has passed since the original answer was accepted, there's a better approach:

String out = new Scanner(new URL("http://www.google.com").openStream(), "UTF-8").useDelimiter("\\A").next();

If you want a slightly fuller implementation, which is not a single line, do this:

public static String readStringFromURL(String requestURL) throws IOException
{
    try (Scanner scanner = new Scanner(new URL(requestURL).openStream(),
            StandardCharsets.UTF_8.toString()))
    {
        scanner.useDelimiter("\\A");
        return scanner.hasNext() ? scanner.next() : "";
    }
}
ccleve
  • 15,239
  • 27
  • 91
  • 157
  • 18
    Just don't forget you need to call `Scanner#close()` later. – Marcelo Dec 21 '12 at 03:55
  • if the compiler gives a leak warning you should split the statement as here http://stackoverflow.com/questions/11463327/is-this-a-memory-leak-or-a-false-positive – M.C. Apr 28 '13 at 06:40
  • 2
    The regular expression \\A matches the beginning of input. This tells Scanner to tokenize the entire stream, from beginning to (illogical) next beginning. – Rune May 05 '13 at 10:00
  • 7
    Neat, but fails if the webpage returns no content (""). You need `String result = scanner.hasNext() ? scanner.next() : "";` to handle that. – NateS Mar 16 '14 at 13:25
  • 1
    Isn’t it necessary to close all of the resources properly? `String s(URL u)throws IOException{HttpURLConnection c=null;InputStream i=null;Scanner s=null;try{c=(HttpURLConnection) u.openConnection();i=c.getInputStream();s=new Scanner(i,"UTF-8").useDelimiter("\\A");return s.hasNext()?s.next():"";}finally{if(s!=null)s.close();if(i!=null)try{i.close();}catch(IOException e){}if(c != null)c.disconnect();}}` Perhaps you also want to set some timeouts: `c.setConnectTimeout(5000);c.setReadTimeout(25000);` – Matthias Ronge Feb 03 '15 at 07:55
  • @Marcelo What do you mean like this? Seems you would have to split it into multiple statements to close an unassigned value. – Erik Humphrey May 29 '17 at 14:32
  • 3
    @ccleve it would be useful to add imports here, there are multiple Scanners and URLs in Java – kiedysktos Jun 06 '17 at 07:23
  • 2
    @ccleve can you update the link "This explains the \\A:"? – Imaskar Jan 25 '18 at 08:43
  • This answer is dangerous, as Scanner will swallow IOExceptions produced by the underlying stream and you will not get the full content of the resource. In fact, you must call `scanner.ioException()` to see if there was an exception. From Scanner's docs: " If an invocation of the underlying readable's Readable.read(java.nio.CharBuffer) method throws an IOException then the scanner assumes that the end of the input has been reached. The most recent IOException thrown by the underlying readable can be retrieved via the ioException() method." This has bitten us before. – Jon Chase Aug 10 '18 at 14:34
  • This works with redirects too, for what it's worth ;-) – Brad Parks Aug 23 '18 at 19:37
  • 1
    Note: According to the [documentation](https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html) _The try-with-resources statement ensures that each resource is closed at the end of the statement._ – Crystalzord Nov 28 '18 at 13:45
99

This answer refers to an older version of Java. You may want to look at ccleve's answer.


Here is the traditional way to do this:

import java.net.*;
import java.io.*;

public class URLConnectionReader {
    public static String getText(String url) throws Exception {
        URL website = new URL(url);
        URLConnection connection = website.openConnection();
        BufferedReader in = new BufferedReader(
                                new InputStreamReader(
                                    connection.getInputStream()));

        StringBuilder response = new StringBuilder();
        String inputLine;

        while ((inputLine = in.readLine()) != null) 
            response.append(inputLine);

        in.close();

        return response.toString();
    }

    public static void main(String[] args) throws Exception {
        String content = URLConnectionReader.getText(args[0]);
        System.out.println(content);
    }
}

As @extraneon has suggested, ioutils allows you to do this in a very eloquent way that's still in the Java spirit:

 InputStream in = new URL( "http://jakarta.apache.org" ).openStream();

 try {
   System.out.println( IOUtils.toString( in ) );
 } finally {
   IOUtils.closeQuietly(in);
 }
matt
  • 10,892
  • 3
  • 22
  • 34
Joseph Weissman
  • 5,697
  • 5
  • 46
  • 75
  • 5
    You could rename the main method to, say `getText`, pass URL string as a parameter and have a one-liner: `String content = URLConnectionReader.getText("http://www.yahoo.com/");` – Goran Jovic Dec 01 '10 at 20:27
  • 7
    The string will not contain any line-termination character (because of the use of BufferReader.readLine() which remove them), so it will not be exactly the content of the URL. – Benoît Guédas Aug 21 '13 at 07:55
  • @Benoit Guedas so how to keep the line breaks ? – user1788736 Nov 21 '17 at 23:02
92

Or just use Apache Commons IOUtils.toString(URL url), or the variant that also accepts an encoding parameter.

Ruslan López
  • 4,433
  • 2
  • 26
  • 37
steve
  • 929
  • 6
  • 3
  • 14
    +1 Thanks, this worked perfectly. One line of code AND it closes the stream! Note that `IOUtils.toString(URL)` is deprecated. `IOUtils.toString(URL url, String encoding)` is preferred. – gMale May 21 '13 at 00:13
  • 1
    `IOUtils.toString(url, (Charset) null)` to reach similar result. – franckysnow Feb 04 '15 at 14:57
  • 4
    One line of code, and tens of megabytes of extraneous class files that are now in your runtime. Including a gigantic library to avoid writing a few (actually, one) line of code is not a great decision. – Jeffrey Blattman Nov 22 '17 at 01:19
  • 2
    @JeffreyBlattman if you are using it only once in your application it's probably not such a smart decission, but if you are using it more frequently and other things from the commons-io package then it might be a smart decission again. It also dependens on the application you are writing. If it's a mobile or desktop ap you might think twice about bloating the memory footprint with additional libraries. If it's a server application running on 64 GB RAM machine, then just ignore this 10 MB - memory is cheap nowadays and whether de basic footprint is 1,5% or 2% of your total memory doesn't matter – big data nerd Apr 25 '18 at 07:09
  • I liked that solution... until I realised it doesn't follow redirection :( – Gael Sep 28 '20 at 06:10
35

There's an even better way as of Java 9:

URL u = new URL("http://www.example.com/");
try (InputStream in = u.openStream()) {
    return new String(in.readAllBytes(), StandardCharsets.UTF_8);
}

Like the original groovy example, this assumes that the content is UTF-8 encoded. (If you need something more clever than that, you need to create a URLConnection and use it to figure out the encoding.)

Sean Reilly
  • 21,526
  • 4
  • 48
  • 62
  • 1
    Thanks, this was exactly what I was looking for. It can also be used with `getClass().getResourceAsStream(...)` to open text files inside the jar. – rjh Jun 06 '20 at 19:28
  • Nice but if you need to add a header this will not do – Bostone Sep 15 '20 at 16:30
  • 1
    @Bostone true, but the same thing is true for the original groovy example in the question. – Sean Reilly Sep 16 '20 at 08:49
27

Now that more time has passed, here's a way to do it in Java 8:

URLConnection conn = url.openConnection();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8))) {
    pageText = reader.lines().collect(Collectors.joining("\n"));
}
Hat
  • 540
  • 8
  • 25
Jeanne Boyarsky
  • 12,156
  • 2
  • 49
  • 59
  • When using this example on the `http://www.worldcat.org/webservices/catalog/search/opensearch` webservice, I'm getting only the first two lines of xml. – Ortomala Lokni Apr 07 '16 at 14:13
  • The 400 error is because you need a key to use this webservice. The problem is that this webservice send a bit of xml then take several seconds to do some processing and then send the second part of the xml. The InputStream is closed during the interval and not all content is consumed. I've solved the problem using the http component apache library https://hc.apache.org/httpcomponents-client-ga/ – Ortomala Lokni Apr 11 '16 at 07:12
  • I use this source code in a CORS proxy, URLConnection allows to get the content encoding, it's helpful. @OrtomalaLokni I have a similar problem when I try to download a web page whereas it works when it points to a file available online (an RSS file for example). Thank you for the suggestion. I won't probably use this library but it might be a good source of inspiration to solve my problem as it's open source. – gouessej Aug 07 '20 at 21:45
  • In terms of performance, is this the best option? or wich one do you think it is? – Daniel Henao Mar 16 '21 at 03:10
8

Additional example using Guava:

URL xmlData = ...
String data = Resources.toString(xmlData, Charsets.UTF_8);
takacsot
  • 1,727
  • 2
  • 19
  • 30
  • 2
    Guava docs says [link](http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/io/Resources.html): Note that even though these methods use {@link URL} parameters, they are usually not appropriate for HTTP or other non-classpath resources – gaal Aug 11 '15 at 07:30
6

Java 11+:

URI uri = URI.create("http://www.google.com");
HttpRequest request = HttpRequest.newBuilder(uri).build();
String content = HttpClient.newHttpClient().send(request, BodyHandlers.ofString()).body();
ZhekaKozlov
  • 36,558
  • 20
  • 126
  • 155
4

The following works with Java 7/8, secure urls, and shows how to add a cookie to your request as well. Note this is mostly a direct copy of this other great answer on this page, but added the cookie example, and clarification in that it works with secure urls as well ;-)

If you need to connect to a server with an invalid certificate or self signed certificate, this will throw security errors unless you import the certificate. If you need this functionality, you could consider the approach detailed in this answer to this related question on StackOverflow.

Example

String result = getUrlAsString("https://www.google.com");
System.out.println(result);

outputs

<!doctype html><html itemscope="" .... etc

Code

import java.net.URL;
import java.net.URLConnection;
import java.io.BufferedReader;
import java.io.InputStreamReader;

public static String getUrlAsString(String url)
{
    try
    {
        URL urlObj = new URL(url);
        URLConnection con = urlObj.openConnection();

        con.setDoOutput(true); // we want the response 
        con.setRequestProperty("Cookie", "myCookie=test123");
        con.connect();

        BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));

        StringBuilder response = new StringBuilder();
        String inputLine;

        String newLine = System.getProperty("line.separator");
        while ((inputLine = in.readLine()) != null)
        {
            response.append(inputLine + newLine);
        }

        in.close();

        return response.toString();
    }
    catch (Exception e)
    {
        throw new RuntimeException(e);
    }
}
Community
  • 1
  • 1
Brad Parks
  • 66,836
  • 64
  • 257
  • 336
4

If you have the input stream (see Joe's answer) also consider ioutils.toString( inputstream ).

http://commons.apache.org/io/api-1.4/org/apache/commons/io/IOUtils.html#toString(java.io.InputStream)

extraneon
  • 23,575
  • 2
  • 47
  • 51
3

Here's Jeanne's lovely answer, but wrapped in a tidy function for muppets like me:

private static String getUrl(String aUrl) throws MalformedURLException, IOException
{
    String urlData = "";
    URL urlObj = new URL(aUrl);
    URLConnection conn = urlObj.openConnection();
    try (BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8))) 
    {
        urlData = reader.lines().collect(Collectors.joining("\n"));
    }
    return urlData;
}
Dave
  • 3,093
  • 35
  • 32
2

URL to String in pure Java

Example call to get payload from http get call

 String str = getStringFromUrl("YourUrl");

Implementation

You can use the method described in this answer, on How to read URL to an InputStream and combine it with this answer on How to read InputStream to String.

The outcome will be something like

public String getStringFromUrl(URL url) throws IOException {
        return inputStreamToString(urlToInputStream(url,null));
}

public String inputStreamToString(InputStream inputStream) throws IOException {
    try(ByteArrayOutputStream result = new ByteArrayOutputStream()) {
        byte[] buffer = new byte[1024];
        int length;
        while ((length = inputStream.read(buffer)) != -1) {
            result.write(buffer, 0, length);
        }

        return result.toString(UTF_8);
    }
}

private InputStream urlToInputStream(URL url, Map<String, String> args) {
    HttpURLConnection con = null;
    InputStream inputStream = null;
    try {
        con = (HttpURLConnection) url.openConnection();
        con.setConnectTimeout(15000);
        con.setReadTimeout(15000);
        if (args != null) {
            for (Entry<String, String> e : args.entrySet()) {
                con.setRequestProperty(e.getKey(), e.getValue());
            }
        }
        con.connect();
        int responseCode = con.getResponseCode();
        /* By default the connection will follow redirects. The following
         * block is only entered if the implementation of HttpURLConnection
         * does not perform the redirect. The exact behavior depends to 
         * the actual implementation (e.g. sun.net).
         * !!! Attention: This block allows the connection to 
         * switch protocols (e.g. HTTP to HTTPS), which is <b>not</b> 
         * default behavior. See: https://stackoverflow.com/questions/1884230 
         * for more info!!!
         */
        if (responseCode < 400 && responseCode > 299) {
            String redirectUrl = con.getHeaderField("Location");
            try {
                URL newUrl = new URL(redirectUrl);
                return urlToInputStream(newUrl, args);
            } catch (MalformedURLException e) {
                URL newUrl = new URL(url.getProtocol() + "://" + url.getHost() + redirectUrl);
                return urlToInputStream(newUrl, args);
            }
        }
        /*!!!!!*/
        
        inputStream = con.getInputStream();
        return inputStream;
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

Pros

  • It is pure java

  • It can be easily enhanced by adding different headers as a map (instead of passing a null object, like the example above does), authentication, etc.

  • Handling of protocol switches is supported

jschnasse
  • 8,526
  • 6
  • 32
  • 72
0

Here's how you can do it in Kotlin:

val body = URL(WEBSITE_URL)
    .openStream()
    .let { Scanner(it, "UTF-8") }
    .use {
        it.useDelimiter("\\A") // RegEx that matches the beginning
        if (it.hasNext()) it.next() else ""
    }
digory doo
  • 1,978
  • 2
  • 23
  • 37