3

I'm trying to prevent a string (in this case the value variable) in a POST request being escaped as it's to be stored in JSON. My code is

def addProduct(request):
    if request.POST:
        post = {}
        for key in request.POST:
            value = request.POST[key].encode('utf-8')
            try:
                value = json.loads(value).encode('utf-8')
            except Exception:
                pass
            post[key] = value.encode('utf-8')
        doc = json.dumps(post)

Debugging I can see value is of type unicode which I believe is how Django handles request objects. The actual string although unicode doesn't get its special characters escaped until post[key] = value. If I try to change this to post[key] = value.encode('utf-8') to prevent it getting escaped I get the error: 'ascii' codec can't decode byte 0xe2 in position 38: ordinal not in range(128)

Any ideas?

KingFu
  • 1,358
  • 5
  • 22
  • 42
  • in any encode you have try to override it for `encode(encoding='UTF-8',errors='ignore')` – Victor Castillo Torres Jun 20 '13 at 19:15
  • `post` is a dictionary. Doing `post[key] = value` is not going to escape anything, so your description is not completely correct. Also how do you mean "escaped"? What is your output, and what output do you want? – Lennart Regebro Jun 20 '13 at 19:15
  • @LennartRegebro escaped as in having literal characters such as `'` replaced with `\xe2\x80\x99` – KingFu Jun 20 '13 at 19:17
  • This is unanswerable without (a) knowing the contents of `request.POST`, and (b) knowing what the final value of `doc` should be. – Aya Jun 20 '13 at 19:19
  • The contents of `request.POST` is just plain text from a HTML form. The final `doc` is JSON doc of strings, escaped only with backslashes, not for example `\xe2\x80\x99` – KingFu Jun 20 '13 at 19:31
  • json.dumps() will not put \xe2\x80\x99 characters in the output, but rather \\u00f6l. So your description doesn't match what really is happening, and that's confusing. – Lennart Regebro Jun 20 '13 at 19:35
  • @KingFu The UTF-8 sequence `\xe2\x80\x99` maps to some sort of fancy close single quote `’`. You can't just escape that with a backslash. If you want to translate it to a plain ASCII single quote `'`, you'll probably need to do it manually, or just accept that your final JSON-encoded dictionary will have to contain Unicode characters. – Aya Jun 20 '13 at 19:43
  • @LennartRegebro you're right `post[key] = value` escapes the string with ` \xe2\x80\x99`, then the json.dumps() replaces this with ` `\\u00f6l` type escapes. I wanted just plain unicode with JSON backslash escaping but it seems this just isn't possible? – KingFu Jun 20 '13 at 20:21
  • @KingFu: As I already said: No, `post[key] = value` does not escape anything. Yes, it's possible, but then the resulting JSON data will be Unicode. I really mean exactly what I say. Please read it carefully and ask questions if I'm unclear. – Lennart Regebro Jun 20 '13 at 20:29
  • @LennartRegebro thanks, what I meant was I could see the python interpreter seemed to be adding in the `\xe2` type escapes at this point implicitly, as the value of `value` wasn't escaped until this point. I need to read up alot more on encoding...it frys my brain – KingFu Jun 20 '13 at 20:34

2 Answers2

5

If you want json.dumps to maintain the special characters I think you may find useful the arguments ensure_ascii=False.

  1. Take a look at this answer: Unicode values in strings are escaped when dumping to JSON in Python
  2. This is the docs for json.dumps

Instead of doing it yourself, ensure_ascii=False I think will solve the problem of json escaping the output.

Ex:

json.dumps({'h':u'\xc2\xa3'},ensure_ascii=False)
>>>u'{"h": "\xc2\xa3"}'

UPDATE: Comparison of json.dumps with and without ensure_ascii and a unicode string:

In [7]: json.dumps({'a':u'\u00a3'},ensure_ascii=False)
Out[7]: u'{"a": "\xa3"}'

In [8]: json.dumps({'a':u'\u00a3'})
Out[8]: '{"a": "\\u00a3"}'

Hope this helps!

Community
  • 1
  • 1
Paulo Bu
  • 29,294
  • 6
  • 74
  • 73
  • Adding `ensure_ascii=False` to `json.dumps` gives the error `'ascii' codec can't decode byte 0xe2 in position 39: ordinal not in range(128)` – KingFu Jun 20 '13 at 20:16
  • 1
    Exactly, use the strings as normal unicode. See my answer, I updated it with the pound sign example dumping it with and without `ensure_ascii`. Just don't encode anything and call `dumps` with `ensure_ascii`. – Paulo Bu Jun 20 '13 at 20:23
  • Ah success!!! I removed all the `.encode('utf-8')` added the `ensure_ascii=False` and it works, thank you! – KingFu Jun 20 '13 at 20:28
0

I can't reproduce this. I tried both giving json.dumps Unicode objects and UTF-8 encoded byte strings, and in both cases I got the correctly Unicode escaped json data out:

>>> json.dumps({'foo': u'lölölö'})
'{"foo": "l\\u00f6l\\u00f6l\\u00f6"}'
>>> json.dumps({'foo': u'lölölö'.encode('utf8')})
'{"foo": "l\\u00f6l\\u00f6l\\u00f6"}'

I tried this in Python 2.6 and 2.7, as well as in Python 3.1:

>>> json.dumps({'foo': 'lölölö'})
'{"foo": "l\\u00f6l\\u00f6l\\u00f6"}'
Lennart Regebro
  • 167,292
  • 41
  • 224
  • 251
  • I think I've not explained myself very well. I'm trying to prevent it being escaped. I just want to store unescaped strings in JSON – KingFu Jun 20 '13 at 19:25
  • @KingFu: You can do that, but then the JSON data you get back is a Unicode object. Is that what you want? – Lennart Regebro Jun 20 '13 at 19:33
  • Arn't all JSON strings unicode? Doesn't matter whether its unicode/ascii aslong as the strings don't get escaped. I'm using them in a android app and the textview just displays the escaped characters literally – KingFu Jun 20 '13 at 19:58
  • @KingFu: No, by default under Python 2 JSON data is 8-bit strings. Under Python 3 it's unicode strings, yes. If it's ASCII the strings have to be escaped for obvious reasons. – Lennart Regebro Jun 20 '13 at 20:08