1

So I adapted urllib2 as suggested by answers to another question:

class HttpRequest(urllib2.Request):
  def __init__(self, *args, **kwargs):
    self._method = kwargs.pop('method', 'GET')
    urllib2.Request.__init__(self, *args, **kwargs)
  def get_method(self):
    return self._method

and it works nicely for PUT with JSON:

req = HttpRequest(url=url, method='PUT', 
    data=json.dumps(metadata))
response = urllib2.urlopen(req)

but it fails with data= binary data (partial stacktrace below):

  File "c:\appl\python\2.7.2\lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "c:\appl\python\2.7.2\lib\urllib2.py", line 394, in open
    response = self._open(req, data)
  File "c:\appl\python\2.7.2\lib\urllib2.py", line 412, in _open
    '_open', req)
  File "c:\appl\python\2.7.2\lib\urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "c:\appl\python\2.7.2\lib\urllib2.py", line 1199, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "c:\appl\python\2.7.2\lib\urllib2.py", line 1168, in do_open
    h.request(req.get_method(), req.get_selector(), req.data, headers)
  File "c:\appl\python\2.7.2\lib\httplib.py", line 955, in request
    self._send_request(method, url, body, headers)
  File "c:\appl\python\2.7.2\lib\httplib.py", line 989, in _send_request
    self.endheaders(body)
  File "c:\appl\python\2.7.2\lib\httplib.py", line 951, in endheaders
    self._send_output(message_body)
  File "c:\appl\python\2.7.2\lib\httplib.py", line 809, in _send_output
    msg += message_body
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 10: ordinal
 not in range(128)

Is there a way I can fix this?

Community
  • 1
  • 1
Jason S
  • 184,598
  • 164
  • 608
  • 970
  • What's the type of `metadata`? If it is `unicode`, encode it in some encoding first. – Sven Marnach Nov 02 '11 at 16:02
  • It's a file (pdf or jpg or something), could be several megabytes, so I'm looking for something efficient. If urllib2 isn't it, then oh well. – Jason S Nov 02 '11 at 16:19

3 Answers3

1

It's because

data should be a buffer in the standard application/x-www-form-urlencoded format. The urllib.urlencode() function takes a mapping or sequence of 2-tuples and returns a string in this format.

from urllib2 doc

Xavier Combelle
  • 10,968
  • 5
  • 28
  • 52
  • 1
    The documentation is phrased somewhat misleading on that part. data should be *already encoded*, and that *usually* means `application/x-www-form-urlencoded` in a POST request. – phihag Jan 02 '12 at 22:39
1

You are trying to automatically convert a python unicode string to a regular byte string. JSoN is always unicode, but HTTP must send bytes. If you are confident that the reciever will understand the json encoded data in a particular encoding, you can just encode it that way:

>>> urllib2.urlopen(urllib2.Request("http://example.com", data=u'\u0ca0'))
Traceback (most recent call last):
  ...
UnicodeEncodeError: 'ascii' codec cannot encode character u'\u0ca0' in position 0: ordinal not in range(128)
>>> urllib2.urlopen(urllib2.Request("http://example.com", 
...                                 data=u'\u0ca0'.encode('utf-8')))
<addinfourl at 15700984 whose fp = <socket._fileobject object at 0xdfbe50>>
>>> 

Note the .encode('utf-8'), which converts unicode to str in utf-8. The implicit conversion would use ascii, which cant encode non-ascii characters.

tl;dr ... data=json.dumps(blabla).encode('utf-8') ...

SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
-1

According to the urllib2 documentation, you will need to percent-encode the byte-string.

wberry
  • 18,519
  • 8
  • 53
  • 85
  • Could you source that? It's simply not true, as you can see in @TokenMacGuy's answer. – phihag Jan 02 '12 at 22:41
  • Your comment (and I assume downvote to go with it) is surprising, considering the accepted-correct answer concurs. In the documentation that I linked to, you will find: "data should be a buffer in the standard application/x-www-form-urlencoded format. The urllib.urlencode() function takes a mapping or sequence of 2-tuples and returns a string in this format." Speaking of the second argument to the `Request` constructor. If a server somewhere happens to accept arbitrary bytes in the HTTP GET request, then fine, but that would be a non-standard behavior as far as I know. – wberry Jan 03 '12 at 05:40
  • Or if the `Request` constructor applies the `urlencode` operation (same as percent-encode) for you, then that would allow TokenMacGuy's code to work just fine, but it would also be a documentation error, and still not a problem with my citation. – wberry Jan 03 '12 at 05:47
  • Judging from your comment in response to the accepted correct answer, I wonder whether you are confusing the character encoding and percent-encoding. Percent encoding (or URL encoding) is used to encode byte values as characters. So `%41` doesn't mean "A", it means 0x41. Percent encoding tells you nothing about the meaning of the bytes, only their values. So URL-encoding U+0101 is a two-step process: `U+0101` --> `0xc4 0x81` (if using UTF-8) --> `%C4%81`. Hence the need for both transformations. – wberry Jan 03 '12 at 05:55
  • Maybe I shouldn't have downvoted you: While the answer is wrong, it's based on the misleading documentation. `Request` does **not** apply any transformation on data. *If* data is the representation of a typical form submission, one needs to properly encode it. In the case of a form submission, that entails encoding it to a list of tuples of bytes (in, say, UTF-8), and then percent-encoding (and adding `&`, `=` as glue between the tuples). [continued ..] – phihag Jan 03 '12 at 12:24
  • However, the OP does *not* want to submit a form. He wants to submit arbitrary binary data (but struggles with converting the string he has to bytes). And in the case of binary data (JSON bytes or [an image](http://stackoverflow.com/q/8705962/35070)), one does not need to apply any encoding to `data`. You can see that in [@TokenMacGuy's answer](http://stackoverflow.com/a/7983629/35070), or hopefully even more clearly (since strings aren't involved) in [mine](http://stackoverflow.com/a/8706029/35070). – phihag Jan 03 '12 at 12:29
  • I agree that it is not a problem to provide arbitrary binary data in the body of an HTTP PUT. However, from the documentation, it appears the `Request` class is going to give the data a content-type of `application/x-www-form-urlencoded`. In which case I would think that not percent-encoding it would be wrong. Now if the content-type were changed to `application/octet-stream`, then I would think it perfectly correct to include arbitrary bytes. – wberry Jan 03 '12 at 16:25