5

I am using a somewhat standard pattern for putting retry behavior around requests requests in Python,

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

retry_strategy = Retry(
    total=HTTP_RETRY_LIMIT,
    status_forcelist=HTTP_RETRY_CODES,
    method_whitelist=HTTP_RETRY_METHODS,
    backoff_factor=HTTP_BACKOFF_FACTOR
)
adapter = HTTPAdapter(max_retries=retry_strategy)
http = requests.Session()
http.mount("https://", adapter)
http.mount("http://", adapter)

...

try:
    response = http.get(... some request params ...)
except requests.Exceptions.RetryError as err:
    # Do logic with err to perform error handling & logging.

Unfortunately the docs on RetryError don't explain anything and when I intercept the exception object as above, err.response is None. While you can call str(err) to get the message string of the exception, this would require unreasonable string parsing to attempt to recover the specific response details and even if one is willing to try that, the message actually elides the necessary details. For example, one such response from a deliberate call on a site giving 400s (not that you would really retry on this but just for debugging) gives a message of "(Caused by ResponseError('too many 400 error responses'))" - which elides the actual response details, like the requested site's own description text for the nature of the 400 error (which could be critical to determining handling, or even just to pass back for logging the error).

What I want to do is receive the response for the last unsuccessful retry attempt and use the status code and description of that specific failure to determine the handling logic. Even though I want to make it robust behind retries, I still need to know the underlying failure beyond "too many retries" when ultimately handling the error.

Is it possible to extract this information from the exception raised for retries?

ely
  • 74,674
  • 34
  • 147
  • 228
  • No, it is not possible without "creating a subclass to customize missing behavior from those libraries". – aaron Aug 30 '22 at 14:44

3 Answers3

2

It's not directly supported by the libraries:

It's possible to achieve by subclassing Retry to attach response to MaxRetryError:

from requests.adapters import MaxRetryError, Retry


class MyRetry(Retry):

    def increment(self, *args, **kwargs):
        try:
            return super().increment(*args, **kwargs)
        except MaxRetryError as ex:
            response = kwargs.get('response')
            if response:
                response.read(cache_content=True)
                ex.response = response
            raise

Usage:

# retry_strategy = Retry(
retry_strategy = MyRetry(
# Do logic with err to perform error handling & logging.
print(err.args[0].response.status)
print(err.args[0].response.data)
aaron
  • 39,695
  • 6
  • 46
  • 102
  • It doesn't work out of the box due to the response object not being injected at RequestException https://github.com/psf/requests/blob/main/src/requests/exceptions.py#L17-L24. In order to by pass it, I had to monkey patch it, like: ```python class MyRetryError(RetryError): def __init__(self, *args, **kwargs): kwargs["response"] = getattr(args[0], "response", None) if args else None super(RetryError, self).__init__(*args, **kwargs) requests.exceptions.RetryError = MyRetryError ``` Does it make sense? – gustahrodrigues Aug 22 '23 at 11:52
  • 1
    @gustahrodrigues It works with `err.args[0].response` as per my answer. If you want it with `err.response`, it is indeed necessary to patch `RetryError` as you say. – aaron Aug 22 '23 at 12:49
2

We can't get a response in every exception because a request may not have been sent yet or a request or response may not have reached its destination. For example these exceptions dont' get a response.

urllib3.exceptions.ConnectTimeoutError
urllib3.exceptions.SSLError
urllib3.exceptions.NewConnectionError

There's a parameter in urllib3.util.Retry named raise_on_status which defaults to True. If it's made False, urllib3.exceptions.MaxRetryError won't be raised. And if no exceptions are raised it is certain that a response has arrived. It now becomes easy to response.raise_for_status in the else block of the try block wrapped in another try.

I've changed except RetryError to except Exception to catch other exceptions.

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
from requests.exceptions import RetryError

# DEFAULT_ALLOWED_METHODS = frozenset({'DELETE', 'GET', 'HEAD', 'OPTIONS', 'PUT', 'TRACE'})
#     Default methods to be used for allowed_methods
# RETRY_AFTER_STATUS_CODES = frozenset({413, 429, 503})
#     Default status codes to be used for status_forcelist

HTTP_RETRY_LIMIT = 3
HTTP_BACKOFF_FACTOR = 0.2

retry_strategy = Retry(
    total=HTTP_RETRY_LIMIT,
    backoff_factor=HTTP_BACKOFF_FACTOR,
    raise_on_status=False,
)
adapter = HTTPAdapter(max_retries=retry_strategy)
http = requests.Session()
http.mount("https://", adapter)
http.mount("http://", adapter)
try:
    response = http.get("https://httpbin.org/status/503")
except Exception as err:
    print(err)
else:
    try:
        response.raise_for_status()
    except Exception as e:
        # Do logic with err to perform error handling & logging.
        print(response.reason)
        # Or
        # print(e.response.reason)
    else:
        print(response.text)

Test;

# https://httpbin.org/user-agent
➜  python requests_retry.py
{
  "user-agent": "python-requests/2.28.1"
}

# url =  https://httpbin.org/status/503
➜  python requests_retry.py
SERVICE UNAVAILABLE
Nizam Mohamed
  • 8,751
  • 24
  • 32
  • It doesn't make sense to say 'a request may not have been made' - the entire purpose of setting up a retry is to retry _the request itself_ - any kind of exception that is raised pre-request, such as the examples you listed, should quite obviously be raised as its own exception, wholly separate from anything within `RetryError` - you have not yet even reached the thing-to-be-tried if you are hitting e.g. SSLError. Further, the retry adapter asks for a parameter called `status_forcelist` to explicit describe the HTTP response codes for which retrying is permitted. – ely Sep 02 '22 at 17:08
  • How can `status_forcelist` function as it does if there are other, non-HTTP-error-code-related exceptions or errors that the retry mechanism is (essentially ignoring the user) deciding to retry upon encountering? Wouldn't that be functionally and semantically completely at odds with the retry mechanism itself? – ely Sep 02 '22 at 17:09
  • Nonetheless, despite my skepticism about the above points, your solution with `raise_on_status` is a great idea and can be used to solve the original question under the terms of the bounty. Nice job and thank you for looking deeply into it. – ely Sep 02 '22 at 17:11
  • @ely The retry mechanism keeps trying the *unrequited* ;-) bound by its arguments. It doesn't care which TCP/IP layer broke. The `status_forcelist` is to treat HTTP return codes as error at your will. It can be emptied if needed. – Nizam Mohamed Sep 03 '22 at 15:48
  • That doesn't align with the [documentation of `status_forcelist`](https://urllib3.readthedocs.io/en/stable/reference/urllib3.util.html), "A retry is initiated if the request method is in `allowed_methods` and the response status code is in `status_forcelist`" - if there is not a response (because some other error occurred before a request was made, or any other reason), then there cannot be the defined trigger of the retry, and such an error would immediately raise a different exception outside of anything to do with any retries (at least this is what the documentation _says_ will happen). – ely Sep 06 '22 at 15:21
0

As already indicated by aaron, the actual error that you are trying to catch and the one that is being raised by the library are not the same. Also this heavily depends on the version of library used as it seems they changed things around with the Retry method as well (It is also available from from requests.adapters import Retry including the RetryError).

Working Code

For the following code tested on requests=2.27.1 and python=3.7.12 and Retry from urlib3 as you used it:

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry


HTTP_RETRY_LIMIT = 1
HTTP_RETRY_CODES = [403, 400, 401, 429, 500, 502, 503, 504]
HTTP_RETRY_METHODS = ['HEAD', 'GET', 'OPTIONS', 'TRACE', 'POST']
HTTP_BACKOFF_FACTOR = 1

retry_strategy = Retry(
    total=HTTP_RETRY_LIMIT,
    status_forcelist=HTTP_RETRY_CODES,
    allowed_methods=HTTP_RETRY_METHODS, # changed to allowed_methods
    backoff_factor=HTTP_BACKOFF_FACTOR
)
adapter = HTTPAdapter(max_retries=retry_strategy)
http = requests.Session()
http.mount("https://", adapter)
http.mount("http://", adapter)
try:
    response = http.get('https://www.howtogeek.com/wp-content/uploads/2018/06/')
except (requests.exceptions.RetryError, requests.exceptions.ConnectionError) as err:
    # Do logic with err to perform error handling & logging.
    print(err)
    print(err.args[0].reason)

I did get output of

requests.exceptions.RetryError: HTTPSConnectionPool(host='www.howtogeek.com', port=443): Max retries exceeded with url: /wp-content/uploads/2018/06/ (Caused by ResponseError('too many 403 error responses'))
too many 403 error responses

Alternative with sys.exc_info()

If this isn't enough, you can check importing traceback package or using sys.exc_info() (indexing 0, 1 or 2), check more on this stackoverflow. In your case you would do something like:

import traceback, sys
try:
    response = http.get('https://www.howtogeek.com/wp-content/uploads/2018/06/')
except (requests.exceptions.RetryError, requests.exceptions.ConnectionError) as err:
    # Do logic with err to perform error handling & logging.
    print(sys.exc_info()[0]) # just the class of the exception, check the link for more info

Which returns class, which you might use to error handle, this can be also combined with catching the generic Exception

<class 'requests.exceptions.ConnectionError'>

This gives you a lot of control, as you can do info = sys.exc_info()[1] and obtain the actual object. Thus you can access with the following:

print(info.request.url)
print(info.request.headers)
# and probably most important for you
print(info.args[0].reason) # urllib3.exceptions.ResponseError('too many 403 error responses')

And obtain the resulting info you require:

https://www.howtogeek.com/wp-content/uploads/2018/06/
{'User-Agent': 'python-requests/2.27.1', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive'}
too many 403 error responses

The alternative even more information with full traceback (depends on parsing):

print(traceback.format_exc()) # Returns full stack trace, might not be most useful in your case
Warkaz
  • 845
  • 6
  • 18