2

My problem is with error handling of the python urllib error object. I am unable to read the error message while still keeping it intact in the error object, for it to be consumed later.

response = urllib.request.urlopen(request) # request that will raise an error
response.read()
response.read() # is empty now
# Also tried seek(0), that does not work either.

So this how I intend to use it, but when the Exception bubbles up, the.read() second time is empty.

try:
    response = urllib.request.urlopen(request)
except urllib.error.HTTPError as err:
    self.log.exception(err.read())
    raise err

I tried making a deepcopy of the err object,

import copy
try:
    response = urllib.request.urlopen(request)
except urllib.error.HTTPError as err:
    err_obj_copy = copy.deepcopy(err)
    self.log.exception(
        "Method:{}\n"
        "URL:{}\n"
        "Data:{}\n"
        "Details:{}\n"
        "Headers:{}".format(method, url, data, err_obj_copy.read(), headers))
    raise err

but copy is unable to make a deepcopy and throws an error - TypeError: __init__() missing 5 required positional arguments: 'url', 'code', 'msg', 'hdrs', and 'fp'.

How do I read the error message, while still keeping it intact in the object?

I do know how to do it using requests, but I am stuck with legacy code and need to make it work with urllib

Amey
  • 8,470
  • 9
  • 44
  • 63

2 Answers2

2

This is what I did. Worked for me.

When reading the error for the first time, save it to a variable like this: msg = response.read().decode('utf8'). You can then create a new HTTPError instance, with the message, and propagate it.

resp = urllib.request.urlopen(request)
msg = resp.read().decode('utf8')
self.log.exception(msg)
raise HTTPError(resp.url, resp.code, resp.reason, resp.headers, io.BytesIO(bytes(msg, 'utf8')))
Floyd Kots
  • 76
  • 7
  • You should save the result of `resp.read()` so that you pass in the raw bytes back to `HTTPError` instead of re-encoding the text. See @jf's answer above. – reubano Jan 29 '17 at 17:21
  • Thank you @reubano. It sure is better that way. I don't understand why, at first, when I tried to pass in the raw bytes, the variable `msg` would remain as an empty `bytestring` object. I must have been doing something wrong. I think that's why I decoded the `bytestring`. – Floyd Kots Jan 29 '17 at 18:40
0

The error object may read from the network. Network is not seekable -- you can't go back in the general case.

You could replace err with a new HTTPError instance that reads from a buffer (like io.BytesIO()) instead of the network e.g., (not tested):

content = err.read()
self.log.exception(content)
raise HTTPError(err.url, err.code, err.reason, err.headers, io.BytesIO(content))

Though I'm not sure that you should -- handle the error in a single place instead e.g., reraise a more application specific exception or leave the logging to an upstream handler.

jfs
  • 399,953
  • 195
  • 994
  • 1,670