0

I want to unshorten URLs to get the real address.In some cases there are more than one redirection. I have tried using urllib2 but it seems to be making GET requests which is consuming time and bandwidth. I want get only the headers so that I have the final URL without needing to get the whole body/data of that page. thanks

1 Answers1

1

You need to execute a HTTP HEAD request to get just the headers.

The second answer shows how to perform a HEAD request using urllib. How do you send a HEAD HTTP request in Python 2?

Community
  • 1
  • 1
Gregor
  • 4,306
  • 1
  • 22
  • 37
  • 1
    @Pranay: Also see the `urllib2` answer in the above mentioned link. – RanRag Feb 10 '12 at 12:31
  • I have tried that already as given here http://stackoverflow.com/questions/4201062/how-can-i-unshorten-a-url-using-python but it doesn't work. example address "http://nyti.ms/pvUj8c" it gives back error like socket.error: [Errno 111] Connection refused – Pranay Agarwal Feb 10 '12 at 12:32
  • @Rangrag : it seems your solution also downloads the whole page! – Pranay Agarwal Feb 10 '12 at 12:34
  • import httplib import urlparse def unshorten_url(url): parsed = urlparse.urlparse(url) h = httplib.HTTPConnection(parsed.netloc) h.request('HEAD', parsed.path) response = h.getresponse() if response.status/100 == 3 and response.getheader('Location'): return response.getheader('Location') else: return url – Pranay Agarwal Feb 10 '12 at 12:37
  • @Gregor : Sorry for the formatting but i just posted the code I am using and i am getting this error socket.error: [Errno 111] Connection refused – Pranay Agarwal Feb 10 '12 at 12:44
  • 1
    Your code works for me. unshorten_url("http://nyti.ms/pvUj8c") 'http://feeds.nytimes.com/click.phdo?i=a621defd69034ebb3e896cbe284a8ae4' – Gregor Feb 10 '12 at 12:48
  • @Gregor : strange thing. Actually it working for some of the address like www.iitd.ac.in(my college) but not for the one i mentioned.But when I use urllib it works fine for all the shortened URL but takes time. BTW i am behind a proxy.should that matter ? – Pranay Agarwal Feb 10 '12 at 12:58
  • If your proxy does not allow/forward HEAD requests, this can't work. – Gregor Feb 10 '12 at 13:04
  • @Gregor : oh, thanks.I guess that seems to be the issue.But is there any way to confirm that my proxy is the culprit(some command to check proxy behavior) or is there any way around to this without making GET requests ? – Pranay Agarwal Feb 10 '12 at 13:26