This universal solution is useful also for really huge data" if a result string couldn't fit easily in memory, but it can be still easily written to a stream from a JSON iterator. (This is better than "import simplejson ..." that can help, but not too much).
Tested with Python 2.7, 3.0, 3.3, 3.6, 3.10.0a7. Two times faster than simplejson
. Small memory footprint. Written unit tests.
import itertools
class SerializableGenerator(list):
"""Generator that is serializable by JSON"""
def __init__(self, iterable):
tmp_body = iter(iterable)
try:
self._head = iter([next(tmp_body)])
self.append(tmp_body)
except StopIteration:
self._head = []
def __iter__(self):
return itertools.chain(self._head, *self[:1])
Normal usage (little memory for input, but still make the whole output string in memory):
>>> json.dumps(SerializableGenerator(iter([1, 2])))
"[1, 2]"
>>> json.dumps(SerializableGenerator(iter([])))
"[]"
For really huge data it can be used as generator of JSON chunks in Python 3 and still use very little memory:
>>> iter_json = json.JSONEncoder().iterencode(SerializableGenerator(iter(range(1000000))))
>>> for chunk in iter_json:
... stream.write(chunk)
# or a naive examle
>>> tuple(iter_json)
('[1', ', 2', ... ', 1000000', ']')
The class is used by a normal JSONEncoder().encode(...)
internally by json.dumps(...)
or by an explicit JSONEncoder().iterencode(...)
to get an generator of JSON chunks instead.
(The function iter()
in the examples is not necessary for it to work, only to demonstrate a non trivial input that has no known length.)
Test:
import unittest
import json
# from ?your_module? import SerializableGenerator
class Test(unittest.TestCase):
def combined_dump_assert(self, iterable, expect):
self.assertEqual(json.dumps(SerializableGenerator(iter(iterable))), expect)
def combined_iterencode_assert(self, iterable, expect):
encoder = json.JSONEncoder().iterencode
self.assertEqual(tuple(encoder(SerializableGenerator(iter(iterable)))), expect)
def test_dump_data(self):
self.combined_dump_assert(iter([1, "a"]), '[1, "a"]')
def test_dump_empty(self):
self.combined_dump_assert(iter([]), '[]')
def test_iterencode_data(self):
self.combined_iterencode_assert(iter([1, "a"]), ('[1', ', "a"', ']'))
def test_iterencode_empty(self):
self.combined_iterencode_assert(iter([]), ('[]',))
def test_that_all_data_are_consumed(self):
gen = SerializableGenerator(iter([1, 2]))
list(gen)
self.assertEqual(list(gen), [])
This solution is inspired by three older answers: Vadim Pushtaev (some problem with empty iterable) and user1158559 (unnecessarily complicated) and Claude (in another question, also complicated).
Important differences from these solutions are:
- Important methods
__len__
, __bool__
and other are inherited consistently from a list
class meaningfully initialized.
- The first item of the input is evaluated immediately by
__init__
(not lazily triggered by many other methods) The list
class can know at once if the iterator is empty or not. A non empty list
contains one item with the generator or the list is empty if the iterator is empty.
- The correct implementation of length for an empty iterator is important for the
JSONEncoder.iterencode(...)
method.
- All other methods give a meaningful output, e.g.
__repr__
:
>>> SerializableGenerator((x for x in range(3)))
[<generator object <genexpr> at 0x........>]
An advantage of this solution is that a standard JSON serializer can be used. If nested generators should be supported then the solution with simplejson is probably the best and it has also similar variant with iterencode(...)
.
Stub *.pyi
for strong typing:
from typing import Any, Iterable, Iterator
class SerializableGenerator(list):
def __init__(self, iterable: Iterable[Any]) -> None: ...
def __iter__(self) -> Iterator: ...