1

I have some Python code that parses JSON into objects using factory class methods.

Here's an example of a one such class:

class Rect(object):

    def __init__(self, x, y, width, height):
        self.x = int(x)
        self.y = int(y)
        self.width = int(width)
        self.height = int(height)       

    @classmethod
    def from_dict(cls, raw_dict):
        return cls(**raw_dict)

    @classmethod
    def from_json(cls, raw_json):

        d = json.loads(raw_json)
        return cls.from_dict(d)

When I pass raw, python-serialized JSON into the class it works fine, but if I pass in JSON from a Flask web request (e.g. from Flask's request.json), I get an error on class instantiation:

TypeError: init() keywords must be strings

That's because the JSON I'm getting back from the Flask request is all unicode, and apparently cls(**raw_dict) can't comprehend unicode keys.

In other words, this works:

Rect.from_dict({ "x": 0, "y": 0, "width": 100, "height": 100 })

But this won't:

Rect.from_dict({u'y': 0, u'width': 0, u'x': 0, u'height': 0})

So my question is actually two parts:

  1. How should I approach fixing this? Should I try to convert all incoming JSON form unicode to strings? Or should I try to somehow accommodate unicode?
  2. Why is this even an issue? Shouldn't Python be able to automatically digest unicode as strings? Does this issue manifest itself in languages like Ruby? (NOTE: In answers to this question, people suggest keeping things unicode because that's the JSON spec, but evidently Python doesn't allow simple replacement of strings with unicode- or otherwise my code would work.)

UPDATE:

This problem only manifests itself in Python <=2.6.

Community
  • 1
  • 1
Yarin
  • 173,523
  • 149
  • 402
  • 512

1 Answers1

3

The JSON standard uses Unicode exclusively, so all keys in a dictionary from json.loads() are always unicode values.

For Python 2.6 and earlier, these are easy enough to encode:

d = dict((key.encode('ascii'), value) for (key, value) in json.loads(raw_json).iteritems())

because Python 2 identifiers can only use ASCII characters. In Python 2.7 and up unicode keywords are auto-encoded for you.

In Python 3, identifiers are unicode already and this isn't an issue.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343