2

I'm using a Java program to get expanded URLs from short URLs. Given a Java URLConnection, among the two approaches, which one is better to get the desired result?

Connection.getHeaderField("Location");

vs

Connection.getURL();

I guess both of them give the same output. The first approach did not give me the best results, only 1 out of 7 were resolved. Can the efficiency be increased by the second approach?

Can we use any other better approach?

palacsint
  • 28,416
  • 10
  • 82
  • 109
R1234
  • 484
  • 9
  • 20

2 Answers2

5

I'd use the following:

@Test
public void testLocation() throws Exception {
    final String link = "http://bit.ly/4Agih5";

    final URL url = new URL(link);
    final HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
    urlConnection.setInstanceFollowRedirects(false);

    final String location = urlConnection.getHeaderField("location");
    assertEquals("http://stackoverflow.com/", location);
    assertEquals(link, urlConnection.getURL().toString());
}

With setInstanceFollowRedirects(false) the HttpURLConnection does not follow redirects and the destination page (stackoverflow.com in the above example) will not be downloaded just the redirect page from bit.ly.

One drawback is that when a resolved bit.ly URL points to another short URL for example on tinyurl.com you will get a tinyurl.com link, not what the tinyurl.com redirects to.

Edit:

To see the reponse of bit.ly use curl:

$ curl --dump-header /tmp/headers http://bit.ly/4Agih5
<html>
<head>
<title>bit.ly</title>
</head>
<body>
<a href="http://stackoverflow.com/">moved here</a>
</body>
</html>

As you can see bit.ly sends only a short redirect page. Then check the HTTP headers:

$ cat /tmp/headers
HTTP/1.0 301 Moved Permanently
Server: nginx
Date: Wed, 06 Nov 2013 08:48:59 GMT
Content-Type: text/html; charset=utf-8
Cache-Control: private; max-age=90
Location: http://stackoverflow.com/
Mime-Version: 1.0
Content-Length: 117
X-Cache: MISS from cam
X-Cache-Lookup: MISS from cam:3128
Via: 1.1 cam:3128 (squid/2.7.STABLE7)
Connection: close

It sends a 301 Moved Permanently response with a Location header (which points to http://stackoverflow.com/). Modern browsers don't show you the HTML page above. Instead they automatically redirect you to the URL in the Location header.

palacsint
  • 28,416
  • 10
  • 82
  • 109
  • Can you explain me what are the last two lines doing? The rest of my code is exactly the same. – R1234 Oct 17 '11 at 15:40
  • And Also if I set FollowRidirects to true, does it affect performance significantly? – R1234 Oct 17 '11 at 15:41
  • They are [jUnit assertion methods](http://www.junit.org/apidocs/org/junit/Assert.html#assertEquals%28java.lang.Object,%20java.lang.Object%29), they check whether the first and second parameters are equal or not. In the example they're equal. Performance: if `instanceFollowRedirects` is `true` you download a page from `bit.ly` and second one where the `bit.ly` redirects to (in the example it's the `stackoverflow.com`). With `false` you download only one page, so you use less bandwidth. – palacsint Oct 17 '11 at 18:45
  • @palacsint, if i remove the line of "urlConnection.setInstanceFollowRedirects(false);", there will be failure. What's the reason in it? – LiangWang Nov 06 '13 at 02:50
  • @Jacky: I've started to edit the page but I realized that "there will be a faliure" is a little bit ambiguous. What do you mean a "failure"? – palacsint Nov 06 '13 at 09:36
  • @palacsint, "urlConnection.getHeaderField("location");" will return null if i do that – LiangWang Nov 06 '13 at 20:13
  • @Jacky: I think this case `getHeaderField()` parses headers of the last HTTP requests which is `stackoverflow.com` and it does not return any `Location` header. – palacsint Nov 06 '13 at 23:05
  • @palacsint bit.ly/9mglq8 doesn't have any Location field in the header. How do I get the expanded url. And one more thing how can one handle the case where there are 2 or 3 redirection? – MONU KUMAR May 12 '18 at 12:32
  • @MONUKUMAR: `wget -S bit.ly/9mglq8` shows a `Location` header for me. It must be there unless redirection will not work in browsers. For multiple redirections use a loop with http result code check. – palacsint May 12 '18 at 16:01
2

The above link contains a more complete method along the same line as the previous post https://github.com/cpdomina/WebUtils/blob/master/src/net/cpdomina/webutils/URLUnshortener.java

plb
  • 21
  • 1