115

I can't understand why Java's HttpURLConnection does not follow an HTTP redirect from an HTTP to an HTTPS URL. I use the following code to get the page at https://httpstat.us/:

import java.net.URL;
import java.net.HttpURLConnection;
import java.io.InputStream;

public class Tester {

    public static void main(String argv[]) throws Exception{
        InputStream is = null;

        try {
            String httpUrl = "http://httpstat.us/301";
            URL resourceUrl = new URL(httpUrl);
            HttpURLConnection conn = (HttpURLConnection)resourceUrl.openConnection();
            conn.setConnectTimeout(15000);
            conn.setReadTimeout(15000);
            conn.connect();
            is = conn.getInputStream();
            System.out.println("Original URL: "+httpUrl);
            System.out.println("Connected to: "+conn.getURL());
            System.out.println("HTTP response code received: "+conn.getResponseCode());
            System.out.println("HTTP response message received: "+conn.getResponseMessage());
       } finally {
            if (is != null) is.close();
        }
    }
}

The output of this program is:

Original URL: http://httpstat.us/301
Connected to: http://httpstat.us/301
HTTP response code received: 301
HTTP response message received: Moved Permanently

A request to http://httpstat.us/301 returns the following (shortened) response (which seems absolutely right!):

HTTP/1.1 301 Moved Permanently
Cache-Control: private
Content-Length: 21
Content-Type: text/plain; charset=utf-8
Location: https://httpstat.us

Unfortunately, Java's HttpURLConnection does not follow the redirect!

Note that if you change the original URL to HTTPS (https://httpstat.us/301), Java will follow the redirect as expected!?

sleske
  • 81,358
  • 34
  • 189
  • 227
Shcheklein
  • 5,979
  • 7
  • 44
  • 53
  • 1
    Hi, I edited your question for clarity and to point out the the redirect to HTTPS in particular is the problem. Also, I changed the bit.ly domain to a different one, as use bit.ly is blacklisted in questions. Hope you don't mind, feel free to re-edit. – sleske Sep 26 '19 at 14:20

6 Answers6

134

Redirects are followed only if they use the same protocol. (See the followRedirect() method in the source.) There is no way to disable this check.

Even though we know it mirrors HTTP, from the HTTP protocol point of view, HTTPS is just some other, completely different, unknown protocol. It would be unsafe to follow the redirect without user approval.

For example, suppose the application is set up to perform client authentication automatically. The user expects to be surfing anonymously because he's using HTTP. But if his client follows HTTPS without asking, his identity is revealed to the server.

erickson
  • 265,237
  • 58
  • 395
  • 493
  • 64
    Thanks. I've just found confiramtion: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4620571 . Namely: "After discussion among Java Networking engineers, it is felt that we shouldn't automatically follow redirect from one protocol to another, for instance, from http to https and vise versa, doing so may have serious security consequences. Thus the fix is to return the server responses for redirect. Check response code and Location header field value for redirect information. It's the application's responsibility to follow the redirect." – Shcheklein Dec 10 '09 at 22:41
  • 2
    But does it follow redirect from http to http or https to https? Even that would be wrong. Isn't it? – Sudarshan Bhat Oct 31 '12 at 06:15
  • @Enigma You can configure that behavior [globally](http://docs.oracle.com/javase/7/docs/api/java/net/HttpURLConnection.html#setFollowRedirects(boolean)) or on a [per-instance](http://docs.oracle.com/javase/7/docs/api/java/net/HttpURLConnection.html#setInstanceFollowRedirects(boolean)) basis. By default, it *does* follow redirects if the schema doesn't change. – erickson Oct 31 '12 at 17:33
  • @erickson That *only* applies to redirecting across the same protocol, right? – Joshua Davis Feb 04 '13 at 18:18
  • 7
    @JoshuaDavis Yes, it only applies to redirects to the same protocol. An `HttpURLConnection` won't automatically follow redirects to a different protocol, even if the redirect flag is set. – erickson Feb 04 '13 at 18:33
  • Seems like this is not only true for change of protocol, but as well when method changes. I just found out that **a redirect after `POST` is not followed** automatically (Java SE 7). – hgoebl Jan 27 '14 at 11:39
  • 10
    Java Networking engineers could offer a setFollowTransProtocol(true) option because if we need it we will program it anyway. FYI web browsers, curl and wget and may more follow redirects from HTTP to HTTPS and vice-versa. – supercobra Oct 22 '14 at 02:31
  • 22
    Nobody sets up auto-login on HTTPS and then expects HTTP to be "anonymous". That's nonsensical. It's perfectly safe and normal to follow redirects from HTTP to HTTPS (not the other way around). This is just a typically bad Java API. – Glenn Maynard Mar 27 '16 at 06:34
  • Edited to clarify that HTTPUrlConnection will not follow cross-protocol redirects - it' s in the code :-). – sleske Sep 26 '19 at 14:33
66

HttpURLConnection by design won't automatically redirect from HTTP to HTTPS (or vice versa). Following the redirect may have serious security consequences. SSL (hence HTTPS) creates a session that is unique to the user. This session can be reused for multiple requests. Thus, the server can track all of the requests made from a single person. This is a weak form of identity and is exploitable. Also, the SSL handshake can ask for the client's certificate. If sent to the server, then the client's identity is given to the server.

As erickson points out, suppose the application is set up to perform client authentication automatically. The user expects to be surfing anonymously because he's using HTTP. But if his client follows HTTPS without asking, his identity is revealed to the server.

The programmer has to take extra steps to ensure that credentials, client certificates or SSL session id will not be sent before redirecting from HTTP to HTTPS. The default is to send these. If the redirection hurts the user, do not follow the redirection. This is why automatic redirect is not supported.

With that understood, here's the code which will follow the redirects.

  URL resourceUrl, base, next;
  Map<String, Integer> visited;
  HttpURLConnection conn;
  String location;
  int times;

  ...
  visited = new HashMap<>();

  while (true)
  {
     times = visited.compute(url, (key, count) -> count == null ? 1 : count + 1);

     if (times > 3)
        throw new IOException("Stuck in redirect loop");

     resourceUrl = new URL(url);
     conn        = (HttpURLConnection) resourceUrl.openConnection();

     conn.setConnectTimeout(15000);
     conn.setReadTimeout(15000);
     conn.setInstanceFollowRedirects(false);   // Make the logic below easier to detect redirections
     conn.setRequestProperty("User-Agent", "Mozilla/5.0...");

     switch (conn.getResponseCode())
     {
        case HttpURLConnection.HTTP_MOVED_PERM:
        case HttpURLConnection.HTTP_MOVED_TEMP:
           location = conn.getHeaderField("Location");
           location = URLDecoder.decode(location, "UTF-8");
           base     = new URL(url);               
           next     = new URL(base, location);  // Deal with relative URLs
           url      = next.toExternalForm();
           continue;
     }

     break;
  }

  is = conn.openStream();
  ...
Nathan
  • 8,093
  • 8
  • 50
  • 76
  • This is only one solution that works for more than 1 redirects. Thank you! – Roger Alien Jul 31 '16 at 03:37
  • This works beautifully for multiple redirects (HTTPS API -> HTTP -> HTTP image)! Perfect simple solution. – EricH206 Jan 06 '17 at 11:50
  • 1
    @Nathan - thanks for the details, but I still don't buy it. For instance, if's under the control of the client whether any credentials or client certs are sent. If it hurts, don't do it (in this case, do not follow the redirect). – Julian Reschke Dec 01 '17 at 05:30
  • 1
    I only don't understand the `location = URLDecoder.decode(location...` part. This decodes a working encoded relative part (with space=+ in my case) into a non-working one. After I removed it, it was OK for me. – Niek Dec 22 '19 at 19:31
  • @Niek I am not sure why you do not need it but I do. – Nathan Dec 23 '19 at 17:42
  • Niek is right, location = URLDecoder.decode(location, "UTF-8"); must remove, it will cause error if your URL contain multi-byte character. In my case, the file name「LR-001A-序.mp3」is my original url for download, it become 「LR-001B-%E5%BA%8F.mp3」 while 「location = conn.getHeaderField("Location");」, it is correct if you take the string as URL for next connection, it become 「LR-001B-?.mp3」after 「location = URLDecoder.decode(location, "UTF-8");」, it is wrong, you will get 404 finally. – Eyes Blue May 11 '20 at 12:23
26

Has something called HttpURLConnection.setFollowRedirects(false) by any chance?

You could always call

conn.setInstanceFollowRedirects(true);

if you want to make sure you don't affect the rest of the behaviour of the app.

dldnh
  • 8,923
  • 3
  • 40
  • 52
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Ooo... didn't know about that... Nice find... I was about to look up the class incase there was logic like that.... It makes sense that it would be returning that header giving the single responsibility principal.... now go back to answering C# questions :P [I'm kidding] – monksy Dec 10 '09 at 21:47
  • 2
    Note that setFollowRedirects() should be called on the class, and not on an instance. – karlbecker_com Apr 04 '13 at 21:50
  • 3
    @dldnh: While karlbecker_com was absolutely right about calling `setFollowRedirects` on the type, `setInstanceFollowRedirects` is an *instance* method and can't be called on the type. – Jon Skeet Apr 13 '13 at 06:43
  • 1
    uggh, how did I misread that. sorry about the incorrect edit. also tried to rollback and not sure how I bollocksed that as well. – dldnh Apr 13 '13 at 10:43
7

As mentioned by some of you above, the setFollowRedirect and setInstanceFollowRedirects only work automatically when the redirected protocol is same . ie from http to http and https to https.

setFolloRedirect is at class level and sets this for all instances of the url connection, whereas setInstanceFollowRedirects is only for a given instance. This way we can have different behavior for different instances.

I found a very good example here http://www.mkyong.com/java/java-httpurlconnection-follow-redirect-example/

Shalvika
  • 381
  • 3
  • 6
6

Another option can be to use Apache HttpComponents Client:

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
</dependency>

Sample code:

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpget = new HttpGet("https://media-hearth.cursecdn.com/avatars/330/498/212.png");
CloseableHttpResponse response = httpclient.execute(httpget);
HttpEntity entity = response.getEntity();
InputStream is = entity.getContent();
Koray Tugay
  • 22,894
  • 45
  • 188
  • 319
-5

HTTPUrlConnection is not responsible for handling the response of the object. It is performance as expected, it grabs the content of the URL requested. It is up to you the user of the functionality to interpret the response. It is not able to read the intentions of the developer without specification.

monksy
  • 14,156
  • 17
  • 75
  • 124
  • 8
    Why it has setInstanceFollowRedirects in this case? )) – Shcheklein Dec 10 '09 at 21:46
  • My guess is that it was a suggested feature to add in later, it makes sense.. my comment was more of reflected toward... the class is designed to go and grab web content and bring it back... people may want to get non HTTP 200 messages. – monksy Dec 10 '09 at 21:49