106

What is the recommended way of serializing a namedtuple to json with the field names retained?

Serializing a namedtuple to json results in only the values being serialized and the field names being lost in translation. I would like the fields also to be retained when json-ized and hence did the following:

class foobar(namedtuple('f', 'foo, bar')):
    __slots__ = ()
    def __iter__(self):
        yield self._asdict()

The above serializes to json as I expect and behaves as namedtuple in other places I use (attribute access etc.,) except with a non-tuple like results while iterating it (which fine for my use case).

What is the "correct way" of converting to json with the field names retained?

Russ
  • 10,835
  • 12
  • 42
  • 57
calvinkrishy
  • 3,798
  • 8
  • 30
  • 45
  • for python 2.7: https://stackoverflow.com/questions/16938456/serializing-a-nested-namedtuple-into-json-with-python-2-7 – lowtech Sep 18 '18 at 22:03

12 Answers12

103

If it's just one namedtuple you're looking to serialize, using its _asdict() method will work (with Python >= 2.7)

>>> from collections import namedtuple
>>> import json
>>> FB = namedtuple("FB", ("foo", "bar"))
>>> fb = FB(123, 456)
>>> json.dumps(fb._asdict())
'{"foo": 123, "bar": 456}'
benselme
  • 3,157
  • 3
  • 17
  • 22
59

This is pretty tricky, since namedtuple() is a factory which returns a new type derived from tuple. One approach would be to have your class also inherit from UserDict.DictMixin, but tuple.__getitem__ is already defined and expects an integer denoting the position of the element, not the name of its attribute:

>>> f = foobar('a', 1)
>>> f[0]
'a'

At its heart the namedtuple is an odd fit for JSON, since it is really a custom-built type whose key names are fixed as part of the type definition, unlike a dictionary where key names are stored inside the instance. This prevents you from "round-tripping" a namedtuple, e.g. you cannot decode a dictionary back into a namedtuple without some other a piece of information, like an app-specific type marker in the dict {'a': 1, '#_type': 'foobar'}, which is a bit hacky.

This is not ideal, but if you only need to encode namedtuples into dictionaries, another approach is to extend or modify your JSON encoder to special-case these types. Here is an example of subclassing the Python json.JSONEncoder. This tackles the problem of ensuring that nested namedtuples are properly converted to dictionaries:

from collections import namedtuple
from json import JSONEncoder

class MyEncoder(JSONEncoder):

    def _iterencode(self, obj, markers=None):
        if isinstance(obj, tuple) and hasattr(obj, '_asdict'):
            gen = self._iterencode_dict(obj._asdict(), markers)
        else:
            gen = JSONEncoder._iterencode(self, obj, markers)
        for chunk in gen:
            yield chunk

class foobar(namedtuple('f', 'foo, bar')):
    pass

enc = MyEncoder()
for obj in (foobar('a', 1), ('a', 1), {'outer': foobar('x', 'y')}):
    print enc.encode(obj)

{"foo": "a", "bar": 1}
["a", 1]
{"outer": {"foo": "x", "bar": "y"}}
samplebias
  • 37,113
  • 6
  • 107
  • 103
  • 17
    _At its heart the namedtuple is an odd fit for JSON, since it is really a custom-built type whose key names are fixed as part of the type definition, unlike a dictionary where key names are stored inside the instance._ Very insightful comment. I had not thought about that. Thanks. I like namedtuples since they provide a nice immutable structure *with* attribute naming convenience. I will accept your answer. Having said that, Java's serialization mechanism provides more control over _how_ the object is serialized and I am curious to know why such hooks does not seem to exist in Python. – calvinkrishy May 06 '11 at 19:05
  • 1
    That was my first approach, but it doesn't actually work (for me anyways). – Zach Kelling May 06 '11 at 19:05
  • 2
    `>>> json.dumps(foobar('x', 'y'), cls=MyEncoder)` `<<< '["x", "y"]'` – Zach Kelling May 06 '11 at 19:06
  • Using python 2.7.1, I've tried with both built-in json and simplejson. – Zach Kelling May 06 '11 at 19:09
  • 22
    Ah, in python 2.7+ _iterencode is no longer a method of JSONEncoder. – Zach Kelling May 06 '11 at 19:47
  • 2
    @calvin Thanks, I find the namedtuple useful as well, wish there were a better solution to encode it recursively to JSON. @zeekay Yep, seems in 2.7+ they hide it so it can no longer be overridden. That is disappointing. – samplebias May 06 '11 at 19:55
  • https://stackoverflow.com/a/45130044/1492613 it does not work any more, now the only entry point is default(), but the custom default() won't get called if json think it knows the default() – Wang Nov 26 '20 at 19:29
22

It looks like you used to be able to subclass simplejson.JSONEncoder to make this work, but with the latest simplejson code, that is no longer the case: you have to actually modify the project code. I see no reason why simplejson should not support namedtuples, so I forked the project, added namedtuple support, and I'm currently waiting for my branch to be pulled back into the main project. If you need the fixes now, just pull from my fork.

EDIT: Looks like the latest versions of simplejson now natively support this with the namedtuple_as_object option, which defaults to True.

singingwolfboy
  • 5,336
  • 3
  • 27
  • 32
  • 3
    Your edit is the correct answer. simplejson serializes namedtuples differently (my opinion: better) than json. This really makes the pattern: "try: import simplejson as json except: import json", risky since you might get different behavior on some machines depending on if simplejson is installed. For that reason, I now require simplejson in a lot of my setup files and abstain from that pattern. – marr75 Sep 05 '12 at 15:12
  • 2
    @marr75 - Ditto for `ujson`, which is even more bizarre and unpredictable in such edge cases... – mac Nov 19 '14 at 09:31
  • I was able to get a recursive namedtuple serialized to (pretty-printed) json using: `simplejson.dumps(my_tuple, indent=4)` – KFL Oct 23 '18 at 21:30
6

I wrote a library for doing this: https://github.com/ltworf/typedload

It can go from and to named-tuple and back.

It supports quite complicated nested structures, with lists, sets, enums, unions, default values. It should cover most common cases.

edit: The library also supports dataclass and attr classes.

LtWorf
  • 7,286
  • 6
  • 31
  • 45
5

There is a more convenient solution is to use the decorator (it uses the protected field _fields).

Python 2.7+:

import json
from collections import namedtuple, OrderedDict

def json_serializable(cls):
    def as_dict(self):
        yield OrderedDict(
            (name, value) for name, value in zip(
                self._fields,
                iter(super(cls, self).__iter__())))
    cls.__iter__ = as_dict
    return cls

#Usage:

C = json_serializable(namedtuple('C', 'a b c'))
print json.dumps(C('abc', True, 3.14))

# or

@json_serializable
class D(namedtuple('D', 'a b c')):
    pass

print json.dumps(D('abc', True, 3.14))

Python 3.6.6+:

import json
from typing import TupleName

def json_serializable(cls):
    def as_dict(self):
        yield {name: value for name, value in zip(
            self._fields,
            iter(super(cls, self).__iter__()))}
    cls.__iter__ = as_dict
    return cls

# Usage:

@json_serializable
class C(NamedTuple):
    a: str
    b: bool
    c: float

print(json.dumps(C('abc', True, 3.14))
Dmitry T.
  • 673
  • 7
  • 7
  • Don't do that, they change the internal API all the time. My typedload library has several cases for different py versions. – LtWorf Sep 13 '18 at 10:28
  • Yes, it's clear. However, nobody should migrate to a newer Python version without testing. And, the other solutions use `_asdict`, which is also a "protected" class member. – Dmitry T. Sep 17 '18 at 09:42
  • My point is: use my library, it runs tests on a few versions of python so it's less likely that you'll get bad surprises from that :p – LtWorf Sep 17 '18 at 12:28
  • 1
    LtWorf, your library is GPL and doesn't work with frozensets – Thomas Grainger Oct 14 '18 at 05:01
  • 2
    @LtWorf Your library also uses `_fields` ;-) https://github.com/ltworf/typedload/blob/master/typedload/datadumper.py It's part of namedtuple's public API, actually: https://docs.python.org/3.7/library/collections.html#collections.namedtuple People get confused by the underscore (no wonder!). It's bad design, but I don't know what other choice they had. – quant_dev Dec 09 '18 at 13:30
  • @ThomasGrainger my library is designed to be extended to whatever types you need. And yes it is GPL because I don't care about helping proprietary stuff. – LtWorf Dec 09 '18 at 14:32
  • @quant_dev they tend to change that API and the library supports different versions of python. – LtWorf Dec 09 '18 at 14:33
  • When was `_fields` changed, then? – quant_dev Dec 10 '18 at 15:02
  • @quant_dev https://github.com/ltworf/typedload/blob/master/typedload/dataloader.py#L38-L61 they change multiple things between versions. – LtWorf Dec 16 '18 at 01:08
  • 1
    What things? When? Can you cite release notes? – quant_dev Dec 17 '18 at 10:52
  • this approach has some problem, it make namedtuple as dict in a list. `[{"a": "abc", "b": true, "c": 3.14}]` – Wang Nov 26 '20 at 18:56
  • @quant_dev i cited the code. Nothing is in release notes as internal API changes do not get reported. – LtWorf Jun 21 '21 at 12:32
5

It's impossible to serialize namedtuples correctly with the native python json library. It will always see tuples as lists, and it is impossible to override the default serializer to change this behaviour. It's worse if objects are nested.

Better to use a more robust library like orjson:

import orjson
from typing import NamedTuple

class Rectangle(NamedTuple):
    width: int
    height: int

def default(obj):
    if hasattr(obj, '_asdict'):
        return obj._asdict()

rectangle = Rectangle(width=10, height=20)
print(orjson.dumps(rectangle, default=default))

=>

{
    "width":10,
    "height":20
}
mikebridge
  • 4,209
  • 2
  • 40
  • 50
  • 1
    i'm a fan of `orjson` too. – CircleOnCircles Nov 13 '20 at 10:10
  • 1
    orjson is great (also handles dates, dataclasses etc.), but on rare architectures (e.g. I was using it on [termux](https://termux.com/)), since its rust-based, it fails to build. So, if portability is a concern, `simplejson` may be better. Its also possible to use try to import orjson for the perf, else fallback to simplejson (since thats pure python) if that `ModuleNotFound`s - [example](https://github.com/karlicoss/HPI/blob/6185942f780c4ce7c870a7c039438ff3120bb8d6/my/core/serialize.py#L72-L111) – Sean Breckenridge Mar 05 '22 at 08:53
3

It recursively converts the namedTuple data to json.

print(m1)
## Message(id=2, agent=Agent(id=1, first_name='asd', last_name='asd', mail='2@mai.com'), customer=Customer(id=1, first_name='asd', last_name='asd', mail='2@mai.com', phone_number=123123), type='image', content='text', media_url='h.com', la=123123, ls=4512313)

def reqursive_to_json(obj):
    _json = {}

    if isinstance(obj, tuple):
        datas = obj._asdict()
        for data in datas:
            if isinstance(datas[data], tuple):
                _json[data] = (reqursive_to_json(datas[data]))
            else:
                 print(datas[data])
                _json[data] = (datas[data])
    return _json

data = reqursive_to_json(m1)
print(data)
{'agent': {'first_name': 'asd',
'last_name': 'asd',
'mail': '2@mai.com',
'id': 1},
'content': 'text',
'customer': {'first_name': 'asd',
'last_name': 'asd',
'mail': '2@mai.com',
'phone_number': 123123,
'id': 1},
'id': 2,
'la': 123123,
'ls': 4512313,
'media_url': 'h.com',
'type': 'image'}
Tolgahan ÜZÜN
  • 423
  • 7
  • 14
  • 2
    +1 I made almost the same. But your return is a dict not json. You must have " not ', and if a value in your object is a boolean, it will not be converted to true. I think it's safer to transform into dict, then use json.dumps to convert into json. – Fred Laurent Mar 30 '18 at 16:28
2

The jsonplus library provides a serializer for NamedTuple instances. Use its compatibility mode to output simple objects if needed, but prefer the default as it is helpful for decoding back.

Gonzalo
  • 3,674
  • 2
  • 26
  • 28
  • I looked at the other solutions here and found simply adding this dependency saved me a lot of time. Particularly because I had a list of NamedTuples that I needed to pass as json in the session. jsonplus lets you basically get lists of named tuples into and out of json with `.dumps()` and `.loads()` no config it just works. – Rob Jun 10 '20 at 14:59
1

This is an old question. However:

A suggestion for all those with the same question, think carefully about using any of the private or internal features of the NamedTuple because they have before and will change again over time.

For example, if your NamedTuple is a flat value object and you're only interested in serializing it and not in cases where it is nested into another object, you could avoid the troubles that would come up with __dict__ being removed or _as_dict() changing and just do something like (and yes this is Python 3 because this answer is for the present):

from typing import NamedTuple

class ApiListRequest(NamedTuple):
  group: str="default"
  filter: str="*"

  def to_dict(self):
    return {
      'group': self.group,
      'filter': self.filter,
    }

  def to_json(self):
    return json.dumps(self.to_dict())

I tried to use the default callable kwarg to dumps in order to do the to_dict() call if available, but that didn't get called as the NamedTuple is convertible to a list.

dlamblin
  • 43,965
  • 20
  • 101
  • 140
  • 4
    `_asdict` is part of namedtuple public API. They explain the reason for the underscore https://docs.python.org/3.7/library/collections.html#collections.namedtuple "In addition to the methods inherited from tuples, named tuples support three additional methods and two attributes. To prevent conflicts with field names, the method and attribute names start with an underscore." – quant_dev Dec 09 '18 at 13:26
  • @quant_dev thanks, I didn't see that explanation. It's not a guarantee of api stability, but it helps make those methods more trustworthy. I do like the explicit to_dict readability, but I can see it seems like reimplementing _as_dict – dlamblin Dec 10 '18 at 15:21
1

Here is my take on the problem. It serializes the NamedTuple, takes care of folded NamedTuples and Lists inside of them

def recursive_to_dict(obj: Any) -> dict:
_dict = {}

if isinstance(obj, tuple):
    node = obj._asdict()
    for item in node:
        if isinstance(node[item], list): # Process as a list
            _dict[item] = [recursive_to_dict(x) for x in (node[item])]
        elif getattr(node[item], "_asdict", False): # Process as a NamedTuple
            _dict[item] = recursive_to_dict(node[item])
        else: # Process as a regular element
            _dict[item] = (node[item])
return _dict
Dim
  • 511
  • 4
  • 6
1

simplejson.dump() instead of json.dump does the job. It may be slower though.

Smit Johnth
  • 2,281
  • 1
  • 22
  • 16
0

I know this is a very old thread but, one solution I came up with for this problem is to patch or override function json.encoder._make_iterencode with a similar custom one in which we extend it to handle named tuples separately. I am not sure if this is good practice or if there is a standard safer way to do the patching:

def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
        _key_separator, _item_separator, _sort_keys, _skipkeys, _one_shot,
        ValueError=ValueError,
        dict=dict,
        float=float,
        id=id,
        int=int,
        isinstance=isinstance,
        list=list,
        str=str,
        tuple=tuple,
        _intstr=int.__repr__,
    ):

    if _indent is not None and not isinstance(_indent, str):
        _indent = ' ' * _indent

    def _iterencode_list(lst, _current_indent_level):
        if not lst:
            yield '[]'
            return
        if markers is not None:
            markerid = id(lst)
            if markerid in markers:
                raise ValueError("Circular reference detected")
            markers[markerid] = lst
        buf = '['
        if _indent is not None:
            _current_indent_level += 1
            newline_indent = '\n' + _indent * _current_indent_level
            separator = _item_separator + newline_indent
            buf += newline_indent
        else:
            newline_indent = None
            separator = _item_separator
        first = True
        for value in lst:
            if first:
                first = False
            else:
                buf = separator
            if isinstance(value, str):
                yield buf + _encoder(value)
            elif value is None:
                yield buf + 'null'
            elif value is True:
                yield buf + 'true'
            elif value is False:
                yield buf + 'false'
            elif isinstance(value, int):
                # Subclasses of int/float may override __repr__, but we still
                # want to encode them as integers/floats in JSON. One example
                # within the standard library is IntEnum.
                yield buf + _intstr(value)
            elif isinstance(value, float):
                # see comment above for int
                yield buf + _floatstr(value)
            else:
                yield buf

                # EDIT
                ##################
                if isinstance(value, tuple) and hasattr(value, '_asdict'):
                    value = value._asdict()
                    chunks = _iterencode_dict(value, _current_indent_level)
                ##################

                elif isinstance(value, (list, tuple)):
                    chunks = _iterencode_list(value, _current_indent_level)
                elif isinstance(value, dict):
                    chunks = _iterencode_dict(value, _current_indent_level)
                else:
                    chunks = _iterencode(value, _current_indent_level)
                yield from chunks
        if newline_indent is not None:
            _current_indent_level -= 1
            yield '\n' + _indent * _current_indent_level
        yield ']'
        if markers is not None:
            del markers[markerid]

    def _iterencode_dict(dct, _current_indent_level):
        if not dct:
            yield '{}'
            return
        if markers is not None:
            markerid = id(dct)
            if markerid in markers:
                raise ValueError("Circular reference detected")
            markers[markerid] = dct
        yield '{'
        if _indent is not None:
            _current_indent_level += 1
            newline_indent = '\n' + _indent * _current_indent_level
            item_separator = _item_separator + newline_indent
            yield newline_indent
        else:
            newline_indent = None
            item_separator = _item_separator
        first = True
        if _sort_keys:
            items = sorted(dct.items())
        else:
            items = dct.items()
        for key, value in items:
            if isinstance(key, str):
                pass
            # JavaScript is weakly typed for these, so it makes sense to
            # also allow them.  Many encoders seem to do something like this.
            elif isinstance(key, float):
                # see comment for int/float in _make_iterencode
                key = _floatstr(key)
            elif key is True:
                key = 'true'
            elif key is False:
                key = 'false'
            elif key is None:
                key = 'null'
            elif isinstance(key, int):
                # see comment for int/float in _make_iterencode
                key = _intstr(key)
            elif _skipkeys:
                continue
            else:
                raise TypeError(f'keys must be str, int, float, bool or None, '
                                f'not {key.__class__.__name__}')
            if first:
                first = False
            else:
                yield item_separator
            yield _encoder(key)
            yield _key_separator
            if isinstance(value, str):
                yield _encoder(value)
            elif value is None:
                yield 'null'
            elif value is True:
                yield 'true'
            elif value is False:
                yield 'false'
            elif isinstance(value, int):
                # see comment for int/float in _make_iterencode
                yield _intstr(value)
            elif isinstance(value, float):
                # see comment for int/float in _make_iterencode
                yield _floatstr(value)
            else:

                # EDIT
                ###############
                if isinstance(value, tuple) and hasattr(value, '_asdict'):
                    value = value._asdict()
                    chunks = _iterencode_dict(value, _current_indent_level)
                ###############

                elif isinstance(value, (list, tuple)):
                    chunks = _iterencode_list(value, _current_indent_level)
                elif isinstance(value, dict):
                    chunks = _iterencode_dict(value, _current_indent_level)
                else:
                    chunks = _iterencode(value, _current_indent_level)
                yield from chunks
        if newline_indent is not None:
            _current_indent_level -= 1
            yield '\n' + _indent * _current_indent_level
        yield '}'
        if markers is not None:
            del markers[markerid]

    def _iterencode(o, _current_indent_level):
        if isinstance(o, str):
            yield _encoder(o)
        elif o is None:
            yield 'null'
        elif o is True:
            yield 'true'
        elif o is False:
            yield 'false'
        elif isinstance(o, int):
            # see comment for int/float in _make_iterencode
            yield _intstr(o)
        elif isinstance(o, float):
            # see comment for int/float in _make_iterencode
            yield _floatstr(o)

        # EDIT
        ##################
        elif isinstance(o, tuple) and hasattr(o, '_asdict'):
            o = o._asdict()
            yield from _iterencode_dict(o, _current_indent_level)
        ##################

        elif isinstance(o, (list, tuple)):
            yield from _iterencode_list(o, _current_indent_level)
        elif isinstance(o, dict):
            yield from _iterencode_dict(o, _current_indent_level)
        else:
            if markers is not None:
                markerid = id(o)
                if markerid in markers:
                    raise ValueError("Circular reference detected")
                markers[markerid] = o
            o = _default(o)
            yield from _iterencode(o, _current_indent_level)
            if markers is not None:
                del markers[markerid]
    return _iterencode

# alters the json lib
json.encoder._make_iterencode = _make_iterencode

Mosty
  • 1
  • 1