python 3.7 urllib.request doesn't follow redirect URL

Question

I'm using Python 3.7 with urllib. All work fine but it seems not to athomatically redirect when it gets an http redirect request (307).

This is the error i get:

ERROR 2020-06-15 10:25:06,968 HTTP Error 307: Temporary Redirect

I've to handle it with a try-except and manually send another request to the new Location: it works fine but i don't like it.

These is the piece of code i use to perform the request:

      req = urllib.request.Request(url)
      req.add_header('Authorization', auth)
      req.add_header('Content-Type','application/json; charset=utf-8')
      req.data=jdati  
      self.logger.debug(req.headers)
      self.logger.info(req.data)
      resp = urllib.request.urlopen(req)

url is an https resource and i set an header with some Authhorization info and content-type. req.data is a JSON

From urllib documentation i've understood that the redirects are authomatically performed by the the library itself, but it doesn't work for me. It always raises an http 307 error and doesn't follow the redirect URL. I've also tried to use an opener specifiyng the default redirect handler, but with the same result

  opener = urllib.request.build_opener(urllib.request.HTTPRedirectHandler)          
  req = urllib.request.Request(url)
  req.add_header('Authorization', auth)
  req.add_header('Content-Type','application/json; charset=utf-8')
  req.data=jdati  
  resp = opener.open(req)

What could be the problem?

What's the URL you're accessing? (Not the one you're redirecting to.) — , Jun 15 '20 at 08:17
I found this on RFC 2616: 10.3.8 307 Temporary Redirect If the 307 status code is received in response to a request other than GET or HEAD, the user agent **MUST NOT automatically redirect the request** unless it can be confirmed by the user, since this might change the conditions under which the request was issued. Could it be the cause? — calabrone, Jun 15 '20 at 08:22
@calabrone good shout. Quoting the Python doc for HTTPRequestHandler: "Some HTTP redirections require action from this module’s **client** code. If this is the case, HTTPError is raised. See RFC 2616 for details of the precise meanings of the various redirection codes." (my emphasis) -- so you could well be right about 307. In that case your `try-except` probably counts as "confirmed by the user"... (https://docs.python.org/3/library/urllib.request.html#urllib.request.HTTPRedirectHandler) — , Jun 15 '20 at 08:28
@mrblewog is an https url. I prefer not to share it becouse of my businnes policy. — calabrone, Jun 15 '20 at 08:29
No worries -- as you say in another comment this might be down to the special handling for 307. — , Jun 15 '20 at 08:30
@metatoaster i've tried also with the instance but the same error — calabrone, Jun 15 '20 at 08:30
@metatoaster i've thought the same, but why on browser (i've tried only on IE11) works without any _confirmation by the user_? — calabrone, Jun 15 '20 at 08:33
@metatoaster i think you are right. The same request on Chrome (in that case html+javascript) is blocked with a CORS error, while in IE11 all work fine. — calabrone, Jun 15 '20 at 08:45

score 5 · Accepted Answer · edited Oct 07 '21 at 10:55

The reason why the redirect isn't done automatically has been correctly identified by yours truly in the discussion in the comments section. Specifically, RFC 2616, Section 10.3.8 states that:

If the 307 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.

Back to the question - given that data has been assigned, this automatically results in get_method returning POST (as per how this method was implemented), and since that the request method is POST, and the response code is 307, an HTTPError is raised instead as per the above specification. In the context of Python's urllib, this specific section of the urllib.request module raises the exception.

For an experiment, try the following code:

import urllib.request
import urllib.parse


url = 'http://httpbin.org/status/307'
req = urllib.request.Request(url)
req.data = b'hello'  # comment out to not trigger manual redirect handling
try:
    resp = urllib.request.urlopen(req)
except urllib.error.HTTPError as e:
    if e.status != 307:
        raise  # not a status code that can be handled here
    redirected_url = urllib.parse.urljoin(url, e.headers['Location'])
    resp = urllib.request.urlopen(redirected_url)
    print('Redirected -> %s' % redirected_url)  # the original redirected url 
print('Response URL -> %s ' % resp.url)  # the final url

Running the code as is may produce the following

Redirected -> http://httpbin.org/redirect/1
Response URL -> http://httpbin.org/get

Note the subsequent redirect to get was done automatically, as the subsequent request was a GET request. Commenting out req.data assignment line will result in the lack of the "Redirected" output line.

Other notable things to note in the exception handling block, e.read() may be done to retrieve the response body produced by the server as part of the HTTP 307 response (since data was posted, there might be a short entity in the response that may be processed?), and that urljoin is needed as the Location header may be a relative URL (or simply has the host missing) to the subsequent resource.

Also, as a matter of interest (and for linkage purposes), this specific question has been asked multiple times before and I am rather surprised that they never got any answers, which follows:

python 3.7 urllib.request doesn't follow redirect URL

1 Answers1

Linked

Related