0

I have a python dictionary of JSON serialized values.

I want to add to these serialized strings without first doing loads(...), then later doing dumps(...) - so I 'fiddle' with the serialized values:

currently I have:

for key, value in my_dict.items():
    # creating JSON of additional data I want in the JSON string
    extra = dumps({ 'key1': 3, 'key2': 1 }, default=str)

    # cutting the last '}' from the end off 'value', the '{' and '}' from the
    # start and end of 'extra', and then concatting them together.
    my_dict[key] = '%s,%s' % (value[:-1], extra[1:])

I am doing this because I consider the dumps and loads a waste, but my current method is not very pythonic.

Is there a better method?

Note: the 'extra' values are from a different source to the initial JSON values, and cannot be inserted at the point where the original data was serialized.

time differences when using a dict of ~20 JSON blobs:

  • fiddling: 0.0005 seconds
  • json>py>json: 0.0025 seconds

5 times quicker

and for fun with 20,000:

  • fiddling': 0.333
  • json>py>json: 0.813

over 60% quicker

with 200,000:

  • fiddling': 4.5
  • json>py>json: 10.25

over 60% quicker

Rich Tier
  • 9,021
  • 10
  • 48
  • 71
  • See my answer update. Using C libraries for unserializing JSON can be much faster than fiddling with the strings. – Hubro Jan 30 '13 at 13:06

1 Answers1

4

The Pythonic way would be to parse the JSON string, modify the values then serialize it. JSON is very quick to parse, much faster than the standard pickle/unpickle functions, and will probably not slow you down unless you have enormous amounts of data (tens of thousands of lines). Don't fall into the trap of optimizing prematurely.

In any case, you should always write your application in a nice, Pythonic and readable fashion, then (if necessary!) optimize the slow parts of your code later.


Another method of optimization could be to write the relevant code in C, or use a C library for JSON serialization. Take a look at ultrajson or take a look at this answer, which explains how the standard library simplejson can be much faster than the json module you are using.

Community
  • 1
  • 1
Hubro
  • 56,214
  • 69
  • 228
  • 381