3

I need to use a GET request to send JSON to my server via a JavaScript client, so I started echoing responses back to make sure nothing is lost in translation. There doesn't seem to be a problem with normal text, but as soon as I include a Unicode character of any sort (e.g. "ç") the character is encoded somehow (e.g. "\u00e7") and the return value is different from request value. My primary concern is that, A) In my Python code saves what the client intended on sending to the database correctly, and B) I echo the same values back to the client that were sent (when testing).

Perhaps this means I can't use base64, or have to do something different along the way. I'm ok with that. My implementation is just an attempt at a means to an end.

Current steps (any step can be changed, if needed):

Raw JSON string which I want to send to the server:

'{"weird-chars": "°ç"}'

JavaScript Base64 encoded version of the string passed to server via GET param (on a side note, will the equals sign at the end of the encoded string cause any issues?):

http://www.myserver.com/?json=eyJ3ZWlyZC1jaGFycyI6ICLCsMOnIn0=

Python str result from b64decode of param:

'{"weird-chars": "\xc2\xb0\xc3\xa7"}'

Python dict from json.loads of decoded param:

{'weird-chars': u'\xb0\xe7'}

Python str from json.dumps of that dict (and subsequent output to the browser):

'{"weird-chars": "\u00b0\u00e7"}'
Community
  • 1
  • 1
orokusaki
  • 55,146
  • 59
  • 179
  • 257

2 Answers2

3

Everything looks fine to me.

>>> hex(ord(u'°'))
'0xb0'
>>> hex(ord(u'ç'))
'0xe7'

Perhaps you should decode the JSON before attempting to use it.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • @Ignacio - I can't simply loop through each character in the string and convert it using `hex(ord(char))`. Is there a way I can guarantee that the characters are going to be stored in the DB correctly (during the Python `dict` step above), and also return them back to the browser correctly (ie, not encoded), all while not causing any issues or potential bugs? By correctly, I mean that if the user types `ççç` as their name, when they come back to the page, their name doesn't show up as `0xe70xe70xe7`. – orokusaki Dec 17 '10 at 20:29
  • @Ignacio - How is `loads` getting me the "decoded" value, if it's still `\xb0\xe7` hex encoded? I'm just trying to understand. Is `°` supposed to be stored as `xb0` in the database, etc and so you consider that to be "decoded"? Or, are you suggesting to "decode" it some other way before running `loads`? in my example above, I'm already using `loads`, in the second from last step. The browser is still receiving the string from the last step, which is incorrect. – orokusaki Dec 17 '10 at 20:38
  • @Ignacio - ok, that makes sense now. Why is the browser receiving the same thing as what `repr()` shows? Is there a step I'm missing at the end? – orokusaki Dec 17 '10 at 20:41
  • @Ignacio - ok, thanks. I previously thought you just meant decode on the server side in your answer. – orokusaki Dec 17 '10 at 20:59
  • How would you use the JSON on the client without decoding it? – Ignacio Vazquez-Abrams Dec 17 '10 at 21:00
  • @Ignacio - I will decode it from JSON to a JavaScript object, but I didn't want to have to also decode special characters. JSON doens't require ASCII only characters. If I replace `json.dumps(my_dict)` with `json.dumps(my_dict, ensure_ascii=False)`, it works without encoding Unicode characters. Would that be wrong to do? – orokusaki Dec 17 '10 at 21:09
  • Being able to handle the "special characters" is part of JSON. If whatever you're using to decode it can't handle them, then it's not a JSON library. – Ignacio Vazquez-Abrams Dec 17 '10 at 21:21
  • That's what I'll do then. I'm not trying to argue the correct way. I'm simply trying to understand the "why" of the "correct" way, so that I don't go the wrong direction. I want the API users to be able to be able to use standard practices and not have to take extra steps, but it sounds like you're saying any `JSON.decode()` method in JS will do the appropriate conversions. – orokusaki Dec 17 '10 at 21:26
  • I don't know that *any* `JSON.decode()` will work. But the ones built into the browsers as well as the one supplied in `json2.js` should work. – Ignacio Vazquez-Abrams Dec 17 '10 at 21:32
3

Your procedure's fine, you just need 1 more step; that is, encoding from unicode to utf-8 (or any other encoding that supports the 'weird characters'.)

Think of decoding as what you do to go from a regular string to unicode and encoding as what you do to get back from unicode. In other words:

You de - code a str to produce a unicode string

and en - code a unicode string to produce an str.

So:

params = {'weird-chars': u'\xb0\xe7'}

encodedchars = params['weird-chars'].encode('utf-8')

encodedchars will contain your characters, displayed in the selected encoding (in this case, utf-8).

Aphex
  • 7,390
  • 5
  • 33
  • 54