21

I'm using json.dump() and json.load() to save/read a dictionary of strings to/from disk. The issue is that I can't have any of the strings in unicode. They seem to be in unicode no matter how I set the parameters to dump/load (including ensure_ascii and encoding).

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
meteoritepanama
  • 6,092
  • 14
  • 42
  • 55
  • 5
    Please post your actual code and any error messages you are getting. Thank you. – mechanical_meat Mar 06 '12 at 19:22
  • What do you mean by “in Unicode”? Are you talking about the escaping of non-ASCII characters to `\u1234` escapes? If so why is this encoding not acceptable? It's perfectly valid JSON which any parser must accept; there are characters which *must* be encoded in this format even if in general you are leaving non-ASCII characters unescaped. – bobince Mar 07 '12 at 17:00
  • 2
    Because JSON natively is utf-8 format. Python's json.loads() accepts non-ascii symbols and parses them into Unicode strings. jsob.loads() parses all strings into 'unicode' Python type, not 'str'. But json.dumps() escapes **all** non-ascii symbols! So, string != json.dumps( json.loads( string ) ) – Brian Cannard Apr 11 '14 at 11:38
  • Retagged because this question does not make sense in a 3.x context. – Karl Knechtel Aug 04 '22 at 22:25

2 Answers2

29

If you are just dealing with simple JSON objects, you can use the following:

def ascii_encode_dict(data):
    ascii_encode = lambda x: x.encode('ascii')
    return dict(map(ascii_encode, pair) for pair in data.items())

json.loads(json_data, object_hook=ascii_encode_dict)

Here is an example of how it works:

>>> json_data = '{"foo": "bar", "bar": "baz"}'
>>> json.loads(json_data)                                # old call gives unicode
{u'foo': u'bar', u'bar': u'baz'}
>>> json.loads(json_data, object_hook=ascii_encode_dict) # new call gives str
{'foo': 'bar', 'bar': 'baz'}

This answer works for a more complex JSON structure, and gives some nice explanation on the object_hook parameter. There is also another answer there that recursively takes the result of a json.loads() call and converts all of the Unicode strings to byte strings.

Community
  • 1
  • 1
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • 2
    You might want to change the line to `ascii_encode = lambda x: x.encode('ascii','ignore') if the data have many characters whose `ord > 128`. – mac389 Nov 27 '12 at 15:57
  • The link to the other answer for the more complex JSON structure was exactly what I needed. – adg Apr 07 '17 at 08:53
15

And if the json object is a mix of datatypes, not only unicode strings, you can use this expression:

def ascii_encode_dict(data):
    ascii_encode = lambda x: x.encode('ascii') if isinstance(x, unicode) else x 
    return dict(map(ascii_encode, pair) for pair in data.items())
tornord
  • 336
  • 4
  • 8