1

I'm using python to serialize a python object to store it in my cache. For this serialization, I'm using json.dumps() and to unserialize it after I get it out of the cache, I'm using json.loads(). I assumed this roundtrip would work without any trouble. But as you can see below it fails.

>>> import json
>>> from collections import namedtuple
>>> x = {"hello": 1, "goodbye": 2}
>>> y = namedtuple('Struct', x.keys())(*x.values())
>>> y
Struct(goodbye=2, hello=1)

>>> json.loads(json.dumps(y))
[2, 1]           # <= I expected this to be the same value as y above!!

Why is this json.dumps/loads roundtrip lossy? What function can I use to serialize this object to that unserialization will preserve its original value? I tried to use pickle but it fails to serialize the object.

Saqib Ali
  • 11,931
  • 41
  • 133
  • 272
  • json doesn't know about namedtuples. – Jean-François Fabre Jun 12 '17 at 19:16
  • 2
    JSON is not meant to deal with arbitrary user-defined object types. JSON supports "json objects" which correspond to Python `dict`, and "JSON arrays", which corresponds to Python `list`. Apparently, the python `json` module assumes that if you pass it a `tuple`, it gets serialized as a JSON array. But this will always be deserialized as a Python list. You can use `pickle` to serialize types returned by `namedtuple`, but you have to keep the type around. You create the type, instantiate the type, and throw away the type in one swoop: `namedtuple('Struct', x.keys())(*x.values())`. – juanpa.arrivillaga Jun 12 '17 at 19:22
  • 1
    @juanpa.arrivillaga why not making an answer out of this long comment? – Jean-François Fabre Jun 12 '17 at 19:25
  • @Jean-FrançoisFabre so, I'm pretty sure this is a duplicate somewhere, and I was looking for it, but still haven't found it. I may be misremembering things. I think your answer is fine, I would undelete it. – juanpa.arrivillaga Jun 12 '17 at 19:26
  • @juanpa.arrivillaga I found a similar Q&A but it's not a duplicate because it doesn't _explain_ why the field names are lost. – Jean-François Fabre Jun 12 '17 at 19:42
  • juanpa.arrivillaga, Ok. I want to use `pickle` as you suggested. Can you show me how to keep the type around such that it can be used by `pickle.dumps()`? Or is there a way I can define the serializer when I create the namedtuple? – Saqib Ali Jun 12 '17 at 20:10

1 Answers1

3

json tries to serialize the object according to its type. It cannot serialize any object but only the "basic" ones, such as tuple (converting to square brackets like list), dict, list ... (and of course integers, strings, floats).

When testing your object using isinstance, it succeeds on tuple because namedtuple is designed to inherit from tuple:

y = namedtuple('Struct', x.keys())(*x.values())
print(isinstance(y,tuple))

result is True.

So in the encoder.py file in json module, your data is matching the isinstance test in iterencode method (code extract below, around line 311 for my Python 3.4 version):

   if isinstance(value, (list, tuple)):
        chunks = _iterencode_list(value, _current_indent_level)

So any type inheriting from list or tuple is serialized like a list

A workaround for that is proposed here: Serializing a Python namedtuple to json

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219