9

Consider I have a special object which may hold a literal json string, that I intend to use as a field in a larger JSON object, as the literal value itself (not a string containing the JSON).

I want to write a custom encoder that can accomplish this, ie:

> encoder.encode({
>     'a': LiteralJson('{}')
> })
{"a": {}}

I don't believe subclassing JSONEncoder and overriding default will work, because at best there, I can return the string, which would make the result {"a": "{}"}.

Overriding encode also appears not to work when the LiteralJson is nested somewhere inside another dictionary.

The background for this, if you are interested, is that I am storing JSON-encoded values in a cache, and it seems to me to be a waste to deserialize then reserialize all the time. It works that way, but some of these values are fairly long and it just seems like a huge waste.

The following encoder would accomplish what I like (but seems unnecessarily slow):

class MagicEncoder(json.JSONEncoder):

    def default(self, obj):
        if isinstance(obj, LiteralJson):
            return json.loads(obj.content)
        else:
            return json.JSONEncoder.default(self, obj)
Kevin Dolan
  • 4,952
  • 3
  • 35
  • 47
  • I had exactly the same problem today, and I checked the python 2.7 json module contents. Unfortunately this is impossible to achieve without forking json module completely; all encoding happens in displays long nested functions. – Antti Haapala -- Слава Україні Sep 13 '12 at 10:10
  • And besides that, I think it prefers to use the C version when available, which would require even more forking. It's not a huge deal, we're talking about ns here, but it just feels wrong. – Kevin Dolan Sep 13 '12 at 14:15
  • Well, it is an issue if for example with the upcoming PostgreSQL 9.2 you are storing actual json documents in db, and expect to serve them fast. I for one thought this was possible, but no. – Antti Haapala -- Слава Україні Sep 14 '12 at 10:19

2 Answers2

4

I've just realised I had a similar question recently. The answer suggested to use a replacement token.

It's possible to integrate this logic more or less transparently using a custom JSONEncoder that generates these tokens internally using a random UUID. (What I've called "RawJavaScriptText" is the equivalent of your "LiteralJson".)

You can then use json.dumps(testvar, cls=RawJsJSONEncoder) directly.

import json
import uuid

class RawJavaScriptText:
    def __init__(self, jstext):
        self._jstext = jstext
    def get_jstext(self):
        return self._jstext

class RawJsJSONEncoder(json.JSONEncoder):
    def __init__(self, *args, **kwargs):
        json.JSONEncoder.__init__(self, *args, **kwargs)
        self._replacement_map = {}

    def default(self, o):
        if isinstance(o, RawJavaScriptText):
            key = uuid.uuid4().hex
            self._replacement_map[key] = o.get_jstext()
            return key
        else:
            return json.JSONEncoder.default(self, o)

    def encode(self, o):
        result = json.JSONEncoder.encode(self, o)
        for k, v in self._replacement_map.iteritems():
             result = result.replace('"%s"' % (k,), v)
        return result

testvar = {
   'a': 1,
   'b': 'abc',
   'c': RawJavaScriptText('{ "x": [ 1, 2, 3 ] }')
}

print json.dumps(testvar, cls=RawJsJSONEncoder)

Result (using Python 2.6 and 2.7):

{"a": 1, "c": { "x": [ 1, 2, 3 ] }, "b": "abc"}
Community
  • 1
  • 1
Bruno
  • 119,590
  • 31
  • 270
  • 376
-2

it seems to me to be a waste to deserialize then reserialize all the time.

It is wasteful, but just in case anybody was looking for a quick fix this approach works fine.

Cribbing the example from Bruno:

testvar = {
   'a': 1,
   'b': 'abc',
   'c': json.loads('{ "x": [ 1, 2, 3 ] }')
}

print json.dumps(testvar)

Result:

{"a": 1, "c": {"x": [1, 2, 3]}, "b": "abc"}
kibibu
  • 6,115
  • 1
  • 35
  • 41
  • I think the problem with this approach is that `LiteralJson('{}')` is meant to represent an object that would be in memory, whereas `json.loads('{ "x": [ 1, 2, 3 ] }')` is the result of a callable, already deserialised. – Bruno Feb 12 '14 at 18:03
  • @Bruno Yes, it is explicitly a serialize->deserialize->serialize sequence. Which is unpleasant, but certainly the simplest code that could possibly work. – kibibu Feb 13 '14 at 00:56
  • I suppose it depends on the objective. I'm not sure what the OP was aiming to do ultimately. In my question (linked from my answer), I was trying to insert JavaScript code in a few places (in what was otherwise a JSON structure), so my "LiteralJson" equivalent wasn't actually valid JSON, hence the specific encoder (because `json.loads` wouldn't have worked). I presume the OP's objective was at least to avoid this deserialize->serialize->derialize chain. – Bruno Feb 13 '14 at 01:06