5

I am trying to retrieve the final location of a given URL (String ref) as follows:

        HttpURLConnection con = (HttpURLConnection)new URL(ref).openConnection();
        con.setInstanceFollowRedirects(true);
        con.setRequestProperty("User-Agent","");
        int responseCode = con.getResponseCode();
        return con.getURL().toString();

It works in most cases, but rarely returns a URL which yet contains another redirection.

What am I doing wrong here?

Why do I get responseCode = 3xx, even after calling setInstanceFollowRedirects(true)?

UPDATE:

OK, responseCode can sometimes be 3xx.

If it happens, then I will return con.getHeaderField("Location") instead.

The code now is:

        HttpURLConnection con = (HttpURLConnection)new URL(ref).openConnection();
        con.setInstanceFollowRedirects(true);
        con.setRequestProperty("User-Agent","");
        int responseType = con.getResponseCode()/100;
        while (responseType == 1)
        {
            Thread.sleep(10);
            responseType = con.getResponseCode()/100;
        }
        if (responseType == 3)
            return con.getHeaderField("Location");
        return con.getURL().toString();

Will appreciate comment should anyone see anything wrong with the code above.

UPDATE

  • Removed the handling of code 1xx, as according to most commenters it is not necessary.
  • Testing if the Location header exists before returning it, in order to handle code 304.

        HttpURLConnection con = (HttpURLConnection)new URL(ref).openConnection();
        con.setInstanceFollowRedirects(true);
        con.setRequestProperty("User-Agent","");
        if (con.getResponseCode()/100 == 3)
        {
            String target = con.getHeaderField("Location");
            if (target != null)
                return target;
        }
        return con.getURL().toString();
    
barak manos
  • 29,648
  • 10
  • 62
  • 114
  • It's not going to follow a redirect for a response that returns 30x but has no `Location` response header. – Mike Samuel Dec 27 '13 at 20:29
  • Aren't 3xx responses always supposed to have a Location header? I'm still puzzled by the fact that I'm getting 3xx in the first place (after setting InstanceFollowRedirects = true), but I've figured that if a 3xx response is returned, then at least I can count on the fact that it also contains a Location header... Is that a wrong assumption? – barak manos Dec 27 '13 at 20:34
  • 1
    btw, aren't you forgetting to call con.connect() in these snippets? – Jakub Kotowski Dec 27 '13 at 20:47
  • @jkbkot no, it connects automatically when you check the response code or get the input stream – aditsu quit because SE is EVIL Dec 27 '13 at 20:52
  • @barakmanos, No, 304 requests almost never have `Location` response headers, and [RFC 2616](http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html) says "SHOULD" not "MUST" for most of the others w.r.t `Location`. Note that it also recommends that user-agents not follow more than 5 redirect steps and stop redirecting when a cycle is detected. – Mike Samuel Dec 27 '13 at 21:17
  • @MikeSamuel, thank you. Does this mean that con.getHeaderField("Location") will return an empty string or null? What would be the best solution in this case, return con.getURL().toString()? – barak manos Dec 27 '13 at 22:52
  • @barakmanos, `null` indicates a missing header, while empty string would indicate an invalid redirect to the empty URL. I don't know what the best thing to do is; I think that depends on why you're doing this. – Mike Samuel Dec 27 '13 at 23:53

5 Answers5

3

HttpURLConnection will not follow redirects if the protocol changes, such as http to https or https to http. In that case, it will return the 3xx code and you should be able to get the Location header. You may need to open a connection again in case that new url also redirects. So basically, use a loop and break it when you get a non-redirect response code. Also, watch out for infinite redirect loops, you could set a limit for the number of iterations or check if each new url has been visited already.

1

If you just want the redirect url, the response header should give you that:

if (con.getResponseCode() == 301) {
    String redirectUrl = con.getHeaderField("Location");
}
evanwong
  • 5,054
  • 3
  • 31
  • 44
  • I want the "final" location, i.e., the URL string after all redirections have completed. So after calling setInstanceFollowRedirects(true), I would expect the response-code to be anything else other than 3xx. – barak manos Dec 27 '13 at 20:05
1

There probably can easily be multiple levels of redirection - imagine a bit.ly pointing to a youtu.be address pointing to youtube.com. Perhaps you need to loop until you get your 200 OK or until you hit a redirection cycle.

I have trouble locating the source code to check but I believe what I said is true. See e.g. java urlconnection get the final redirected URL

You also might need to handle protocol redirects, e.g. HTTP -> HTTPS: URLConnection Doesn't Follow Redirect

Community
  • 1
  • 1
Jakub Kotowski
  • 7,411
  • 29
  • 38
  • Isn't that exactly what setInstanceFollowRedirects(true) is for? To release the user from worrying about multiple redirections and get the response-code after they are complete? I have previously tried your suggestion, combined with setInstanceFollowRedirects(false). But that solution yielded the wrong result (a "non-final" URL) in certain cases. – barak manos Dec 27 '13 at 20:41
0

I think I now understand what you want. I now think that you are trying to retrieve the final address, not the content of the final address. Please correct me if my assumption is wrong.

For doing this (not the content, but the address), you need a different approach. You need to switch off follow-redirects and you then need to handle the iterational redirect-following on your own until you find a non-redirecting response. Bear in mind that you can not reuse a URLConnection.

The approaches for finding the final address and the other approach for retrieving the content of the final address are so different, because URLConnection does not reveal the followed-to address if you switch on follow-redirects.

In your code, you seem to expect URLConnection.getURL() to return the followed-to address. This is not the behavior of this method. It returns the original URL which you used to create the URLConnection. It does this no matter if you switch on follow-redirects or not.
However, if you switch it on, you will not be able to get the followed-to URL address. This is because getHeaderField("Location"), with follow-redirects, makes no sense: it returns the redirection-target of the final redirect, which should not exist, since it's the final address.

Daniel S.
  • 6,458
  • 4
  • 35
  • 78
0

Sometime it is loading in the field of requestURI. Use like this code:

val declaredField = con.javaClass.getDeclaredField("requestURI")
declaredField.isAccessible=true
val loc = declaredField.get(con).toString()
utrucceh
  • 1,076
  • 6
  • 11