4

This URL:

http://www.yellowpages.com.sg/newiyp/UrlRedirect?applicationInd=yp&searchType=68&searchCriteria=multiple+choices&accessType=8&advertiserName=Multiple+Choices&url=62CE8F02A1BE04A51C81F85D1CE8B54DFC608A9CDA925C15EED5DA6DD90E3F7DC99CFF77216D1D1083877BA841EB97C3

Redirects to:

http://www.callmyname.sg/view/Multiple+Choices/Uk9JRC9TRzA0SkstQkJDNkRFNTEuMTNCNS9FRDY5LUE4NzgtRUY=

When using requests, I get:

import requests

url = "http://www.yellowpages.com.sg/newiyp/UrlRedirect?applicationInd=yp&searchType=68&searchCriteria=multiple+choices&accessType=8&advertiserName=Multiple+Choices&url=62CE8F02A1BE04A51C81F85D1CE8B54DFC608A9CDA925C15EED5DA6DD90E3F7DC99CFF77216D1D1083877BA841EB97C3"
response = requests.get(url)
response.url

It returns the same first URL, not the redirected URL.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
muffinz
  • 51
  • 1
  • 1
  • 3

4 Answers4

9

Here is a sample. I used "bit.ly", because I got 403 using your URL.


>>> url = "http://bit.ly/18SuUzJ"
>>> r = requests.get(url, allow_redirects=False)
>>> r.status_code
    301
>>> r.headers['Location']
    'http://stackoverflow.com/'

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Deck
  • 1,969
  • 4
  • 20
  • 41
2

According to Requests doc, r.history is what you need.

ljk321
  • 16,242
  • 7
  • 48
  • 60
2

A HEAD request could be faster than a GET request. That's even if the GET redirects are not followed. This is because HEAD returns the headers only, not the content, whereas GET returns both.

import requests

>>> response = requests.head('https://bit' + '.ly/pyre', allow_redirects=False)

>>> response.is_redirect
True

>>> response.headers['Location']
'http://www.python.org/doc/current/library/re.html'

The above approach should identify exactly one level of redirect. Also to keep it simple, I use requests.head instead of requests.Session().head.

Asclepius
  • 57,944
  • 17
  • 167
  • 143
0

This site seems to require a session cookie in order for the redirect to work.

r.url does in fact show the URL after the redirect (unless you have changed the configuration).

The problem with your redirect is that it never happens if the cookie isn't already there. You can test that by visiting the URL with a browser in incognito/private mode. You will see an error message from http://www.yellowpages.com.sg/ with a status code 200. On a reload you will then be redirected.

Strangely, I cannot get a redirect even with a requests session. Using a real browser's user agent string doesn't seem to help, either. You might have to compare the two requests in detail to find what the crucial difference is.

The code I tried looks like this:

import requests
headers = {'User-Agent': 'user_agent',}
s = requests.Session()
url = "http://www.yellowpages.com.sg/"
r = s.get(url, headers=headers)
url = "http://www.yellowpages.com.sg/newiyp/UrlRedirect?applicationInd=yp&searchType=68&searchCriteria=multiple+choices&accessType=8&advertiserName=Multiple+Choices&url=62CE8F02A1BE04A51C81F85D1CE8B54DFC608A9CDA925C15EED5DA6DD90E3F7DC99CFF77216D1D1083877BA841EB97C3"
r = s.get(url, headers=headers)
r.url
jsfan
  • 1,373
  • 9
  • 21