1

I'm struggling with the following question for the past half a day and although I've found some info about similar problems, nothing really hits the spot.

I'm trying to send a PUT request using urllib2 with data that contains some Unicode characters:

body = u'{ "bbb" : "asdf\xd7\xa9\xd7\x93\xd7\x92"}'
conn = urllib2.Request(request_url, body, headers)
conn.get_method = lambda: 'PUT'
response = urllib2.urlopen(conn)

I've tried to use body = body.encode('utf-8') and other variations, but whatever I do I get the following error:

UnicodeEncodeError at ...
'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128)

With one of the following call stacks:

File "..." in ...
  195.         response = urllib2.urlopen(conn)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in urlopen
  126.     return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in open
  394.         response = self._open(req, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in _open
  412.                                   '_open', req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in _call_chain
  372.             result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in http_open
  1199.         return self.do_open(httplib.HTTPConnection, req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in do_open
  1168.             h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in request
  955.         self._send_request(method, url, body, headers)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in _send_request
  989.         self.endheaders(body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in endheaders
  951.         self._send_output(message_body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in _send_output
  815.             self.send(message_body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in send
  787.             self.sock.sendall(data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py" in meth
  224.     return getattr(self._sock,name)(*args)

Or the following call stack (for when I do body = body.encode('utf-8')):

File "..." in ...
  195.         response = urllib2.urlopen(conn)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in urlopen
  126.     return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in open
  394.         response = self._open(req, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in _open
  412.                                   '_open', req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in _call_chain
  372.             result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in http_open
  1199.         return self.do_open(httplib.HTTPConnection, req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py" in do_open
  1168.             h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in request
  955.         self._send_request(method, url, body, headers)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in _send_request
  989.         self.endheaders(body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in endheaders
  951.         self._send_output(message_body)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py" in _send_output
  809.             msg += message_body

What am I doing wrong? How can I send a body with Unicode characters via urllib2? If there are no Unicode characters, everything works fine.

Also note that my Content-Type header is set to application/json;charset=utf-8.

If it's relevant in any way, the context of what I'm doing is this: I'm getting a request to my Django server, and I delegate the request to another Django server. I don't redirect, just send the request from my own server get the response and send it back. So body is the request.body in the Django view.

Edit:

My headers are:

{
'Origin': 'http://10.0.0.146:8000', 
'Accept-Language': 'en-US,en;q=0.8', 
'Accept-Encoding': 'gzip,deflate,sdch', 
'Host': 'localhost:5000', 
'Accept': 'application/json, text/plain, */*', 
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.65 Safari/537.31', 
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 
'Connection': 'keep-alive', 
'X-Requested-With': 'XMLHttpRequest', 
'Pragma': 'no-cache', 
'Cache-Control': 'no-cache', 
'Referer': 'http://localhost:5000/', 
'Content-Type': 'application/json;charset=UTF-8', 
'Authorization': 'ApiKey ogkLPgSESNyTOgIdbSLDhJjvyVJcbg:0d5897b5204c2f2527f532c6a97ba18a7f06acdc', 
'Cookie': 'username=ogkLPgSESNyTOgIdbSLDhJjvyVJcbg; _we_wk_ls_=%7B%22time%22%3A1369123506709%7D; __jwpusr=39e63770-ec5c-4b96-9f7f-b199703d0d36; sessionid=0d741a7560258b301979a1c853b42a81; api_key=0d5897b5204c2f2527f532c6a97ba18a7f06acdc'
}
Ofirov
  • 773
  • 1
  • 8
  • 21
  • What is the traceback when you do encode to UTF-8 first (which is the correct course of action)? – Martijn Pieters May 21 '13 at 12:35
  • Edited the question. The second traceback is for when I encode to UTF-8. – Ofirov May 21 '13 at 12:40
  • 2
    Don't include the raw bytes in a unicode string prefixed with `u`. A unicode string should contain unicode codepoints, not encoded UTF-8. – Wooble May 21 '13 at 12:43
  • @Wooble So maybe my question should be: How to convert Django's HttpRequest.body to something I can send in the body of urllib2's request? – Ofirov May 21 '13 at 12:46
  • 1
    Does getting rid of the `u` prefix work? You appear to already have the correct bytes, so there's no need to encode. – Wooble May 21 '13 at 12:50
  • @Wooble No, without the `u` and without the `encode('utf-8')` I get the second traceback. It seems that `msg` (which is constructed by urllib2) is a Unicode (`u`) object. Thus on `msg += message_body`, Python tries to do some sort of automatic conversion and raises an error. – Ofirov May 21 '13 at 13:02
  • 1
    What are your headers? You need to pass only byte strings to `Request`. This applies to the headers, the url and the body. – Martijn Pieters May 21 '13 at 13:03
  • I managed to get it working passing the body encoded un utf-8. – Paulo Bu May 21 '13 at 13:11
  • @MartijnPieters Yes! My URL (which I received as a view parameter) was unicode and that's what caused `msg` to be unicode and everything to fail. Thanks! – Ofirov May 21 '13 at 13:45

1 Answers1

2

You need to pass only byte strings to Request. This applies to the headers, the url and the body.

If any of those three inputs contain Unicode values, automatic conversions between Unicode and strings will take place when concatenating, which will invariably lead to grief.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343