0

meh, I'm not a fan of utf-8 in python; can't seem to figure out how to solve this. As you can see I'm already trying to B64 encode the value, but it looks like python is trying to convert it from utf-8 to ascii first...

In general I'm trying to POST form data that has UTF-8 characters with urllib2. I guess in general its the same as How to send utf-8 content in a urllib2 request? though there is no valid answer on that. I'm trying to send only a byte string by base64 encoding it.

Traceback (most recent call last):
  File "load.py", line 165, in <module>
    main()
  File "load.py", line 17, in main
    beers()
  File "load.py", line 157, in beers
    resp = send_post("http://localhost:9000/beers", beer)
  File "load.py", line 64, in send_post
    connection.request ('POST', req.get_selector(), *encode_multipart_data (data, files))
  File "load.py", line 49, in encode_multipart_data
    lines.extend (encode_field (name))
  File "load.py", line 34, in encode_field
    '', base64.b64encode(u"%s" % data[field_name]))
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/base64.py", line 53, in b64encode
    encoded = binascii.b2a_base64(s)[:-1]
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 7: ordinal not in range(128)

Code:

def random_string (length):
    return ''.join (random.choice (string.ascii_letters) for ii in range (length + 1))


def encode_multipart_data (data, files):
    boundary = random_string (30)

    def get_content_type (filename):
      return mimetypes.guess_type (filename)[0] or 'application/octet-stream'

    def encode_field (field_name):
      return ('--' + boundary,
              'Content-Disposition: form-data; name="%s"' % field_name,
              'Content-Transfer-Encoding: base64',
              '', base64.b64encode(u"%s" % data[field_name]))

    def encode_file (field_name):
      filename = files [field_name]
      file_size = os.stat(filename).st_size
      file_data = open(filename, 'rb').read()
      file_b64 = base64.b64encode(file_data)
      return ('--' + boundary,
              'Content-Disposition: form-data; name="%s"; filename="%s"' % (field_name, filename),
              'Content-Type: %s' % get_content_type(filename),
              'Content-Transfer-Encoding: base64',
              '', file_b64)

    lines = []
    for name in data:
      lines.extend (encode_field (name))
    for name in files:
      lines.extend (encode_file (name))
    lines.extend (('--%s--' % boundary, ''))
    body = '\r\n'.join (lines)

    headers = {'content-type': 'multipart/form-data; boundary=' + boundary,
               'content-length': str(len(body))}

    return body, headers


def send_post (url, data, files={}):
    req = urllib2.Request (url)
    connection = httplib.HTTPConnection (req.get_host())
    connection.request ('POST', req.get_selector(), *encode_multipart_data (data, files))
    return connection.getresponse()

The beer object's json is (this is the data being passed into encode_multipart_data):

    {
    "name"        : "Yuengling Oktoberfest",
    "brewer"      : "Yuengling Brewery",
    "description" : "America’s Oldest Brewery is proud to offer Yuengling Oktoberfest Beer. Copper in color, this medium bodied beer is the perfect blend of roasted malts with just the right amount of hops to capture a true representation of the style. Enjoy a Yuengling Oktoberfest Beer in celebration of the season, while supplies last!",
    "abv"         : 5.2, 
    "ibu"         : 26, 
    "type"        : "Lager",
    "subtype"     : "",
    "color"       : "",
    "seasonal"    : true,
    "servingTemp" : "Cold",
    "rating"      : 3,
    "inProduction": true  
    }
Community
  • 1
  • 1
Justin808
  • 20,859
  • 46
  • 160
  • 265
  • 1
    How do you expect to base64 encode Unicode? Do you want to encode the raw UTF8 bytes as base64? – user2357112 Sep 17 '13 at 04:36
  • What is the value of `beer`? – Robᵩ Sep 17 '13 at 04:39
  • 2
    @Robᵩ - Added beer to the question – Justin808 Sep 17 '13 at 04:48
  • 1
    The error is referring to `'\u2019'`, which is a quote character `'’'`, that I don't see in your data. – Blckknght Sep 17 '13 at 05:40
  • @Blckknght - Sorry, I grabbed the wrong beer. I've changed it to the one with the `'’'` in it. But still, there has to be an issue with the handling of UTF in my code. – Justin808 Sep 17 '13 at 05:42
  • Why don't you post code that does what you actually want it to do instead? I don't see why you're trying to accomplish what you're doing here. – mikebabcock Sep 17 '13 at 05:50
  • @mikebabcock - I'm trying to sent some stuff over a POST. But thats not the issues, that works. I'm getting a UTF conversion error. `base64.b64encode(u"%s" % data[field_name])` is failing. I don't know how to make it not fail. If I remove the b64 encode and just use `u"%s" % data[field_name]` it fails too, so `u"%s" % data[field_name]` must not be correct. Really they only thing not shown is `resp = send_post("http://localhost:9000/beers", beer)`, calling the send_post function. – Justin808 Sep 17 '13 at 05:54
  • cf. http://stackoverflow.com/questions/491921/unicode-utf8-reading-and-writing-to-files-in-python and tell us if that helps. – mikebabcock Sep 17 '13 at 06:12

1 Answers1

4

You can't base64-encode Unicode, only byte strings. In Python 2.7, giving a Unicode string to a function that requires a byte string causes an implicit conversion to a byte string using the ascii codec, resulting in the error you see:

>>> base64.b64encode(u'America\u2019s')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\base64.py", line 53, in b64encode
    encoded = binascii.b2a_base64(s)[:-1]
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 7: ordinal not in range(128)

So encode it to a byte string using a valid encoding first:

>>> base64.b64encode(u'America\u2019s'.encode('utf8'))
'QW1lcmljYeKAmXM='
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251