2

At a project I'm contributing to we have a very simple, but important class (let's call it LegacyClass). Modifying it would be a long process.
I'm contributing new dataclasses (like NormalDataclass) to this project and I need to be able to serialize them to JSON. I don't have access to the JSON encoder, so I cannot specify custom encoder.

Here you can find sample code

import dataclasses
import collections
import json

#region I cannot easily change this code
class LegacyClass(collections.abc.Iterable):
    def __init__(self, a, b):
        self.a = a
        self.b = b
    
    def __iter__(self):
        yield self.a
        yield self.b
    
    def __repr__(self):
        return f"({self.a}, {self.b})"
#endregion

#region I can do whatever I want to this part of code
@dataclasses.dataclass
class NormalDataclass:
    legacy_class: LegacyClass


legacy_class = LegacyClass('a', 'b')
normal_dataclass = NormalDataclass(legacy_class)

normal_dataclass_dict = dataclasses.asdict(normal_dataclass)
#endregion

#region I cannot easily change this code
json.dumps(normal_dataclass_dict)
#endregion

What I would want to get:

{"legacy_class": {"a": "a", "b": "b"}}

What I'm getting:

TypeError: Object of type LegacyClass is not JSON serializable

Do you have any suggestions? Specifying dict_factory as an argument to dataclasses.asdict would be an option, if there would not be multiple levels of LegacyClass nesting, eg:

@dataclasses.dataclass
class AnotherNormalDataclass:
    custom_class: List[Tuple[int, LegacyClass]]

To make dict_factory recursive would be to basically rewrite dataclasses.asdict implementation.

dreptak
  • 193
  • 4
  • 10

2 Answers2

4

Edit: The simplest solution, based on the most recent edit to the question above, would be to define your own dict() method which returns a JSON-serializable dict object. Though in the long term, I'd probably suggest contacting the team who implements the json.dumps part, to see if they can update the encoder implementation for the dataclass.

In any case, here's a working example you can use for the present scenario:

import dataclasses
import collections
import json


class LegacyClass(collections.abc.Iterable):
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __iter__(self):
        yield self.a
        yield self.b

    def __repr__(self):
        return f"({self.a}, {self.b})"


@dataclasses.dataclass
class NormalDataclass:
    legacy_class: LegacyClass

    def dict(self):
        return {'legacy_class': self.legacy_class.__dict__}


legacy_class = LegacyClass('a', 'b')
normal_dataclass = NormalDataclass(legacy_class)

normal_dataclass_dict = normal_dataclass.dict()
print(normal_dataclass_dict)
json.dumps(normal_dataclass_dict)

Output:

{'legacy_class': {'a': 'a', 'b': 'b'}}

You should be able to pass default argument to json.dumps, which will be called whenever the encoder finds an object that it can't serialize to JSON, for example a Python class or a datetime object.

For example:

import dataclasses
import collections
import json


class LegacyClass(collections.abc.Iterable):
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __iter__(self):
        yield self.a
        yield self.b

    def __repr__(self):
        return f"({self.a}, {self.b})"


@dataclasses.dataclass
class NormalDataclass:
    legacy_class: LegacyClass


legacy_class = LegacyClass('aa', 'bb')
normal_dataclass = NormalDataclass(legacy_class)

normal_dataclass_dict = dataclasses.asdict(normal_dataclass)


o = json.dumps(normal_dataclass_dict,
               ### ADDED ###
               default=lambda o: o.__dict__)

print(o)  # {"legacy_class": {"a": "aa", "b": "bb"}}

If you have a more complex use case, you could consider creating a default function which can check the type of each value as it gets serialized to JSON:

import dataclasses
import collections
import json
from datetime import date, time
from typing import Any


class LegacyClass(collections.abc.Iterable):
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __iter__(self):
        yield self.a
        yield self.b

    def __repr__(self):
        return f"({self.a}, {self.b})"


@dataclasses.dataclass
class NormalDataclass:
    legacy_class: LegacyClass
    my_date: date = date.min


legacy_class = LegacyClass('aa', 'bb')
normal_dataclass = NormalDataclass(legacy_class)

normal_dataclass_dict = dataclasses.asdict(normal_dataclass)


def default_func(o: Any):

    # it's a date, time, or datetime
    if isinstance(o, (date, time)):
        return o.isoformat()

    # it's a Python class (with a `__dict__` attribute)
    if isinstance(type(o), type) and hasattr(o, '__dict__'):
        return o.__dict__

    # print a warning and return a null
    print(f'couldn\'t find an encoder for: {o!r}, type={type(o)}')
    return None


o = json.dumps(normal_dataclass_dict, default=default_func)

print(o)  # {"legacy_class": {"a": "aa", "b": "bb"}, "my_date": "0001-01-01"}
rv.kvetch
  • 9,940
  • 3
  • 24
  • 53
  • 1
    Hi, thanks for answer. I've edited my post to make clearer, that I do not have easy access to ```json.dumps``` part. I'm sending ```dict``` to module that does it. If no ideas come regarding creating ```dict``` made up of python primitives (instead of ```LegacyClass```, probably I would have to ask the team responsible for that module to include sth like this. – dreptak Feb 21 '22 at 12:53
  • @dreptak ah I see, thanks for clarifying a bit better. I updated my post above with a current workaround you can use for the present. It involves defining your own logic for serializing the `NormalDataclass` object, and using `__dict__` to retrieve a valid `dict` representation of the `LegacyClass` object. – rv.kvetch Feb 21 '22 at 18:56
0

If you don't mind using a third-party dependency, you can solve the problem with mashumaro. You just need to add DataClassDictMixin and register a custom serialization / deserialization method for LegacyClass:

from mashumaro import DataClassDictMixin
from mashumaro.config import BaseConfig

@dataclasses.dataclass
class NormalDataclass(DataClassDictMixin):
    legacy_class: LegacyClass

    class Config(BaseConfig):
        serialization_strategy = {
            LegacyClass: {
                "serialize": lambda x: {"a": x.a, "b": x.b},
                "deserialize": lambda d: LegacyClass(d["a"], d["b"]),
            }
        }


legacy_class = LegacyClass('a', 'b')
normal_dataclass = NormalDataclass(legacy_class)

normal_dataclass_dict = normal_dataclass.to_dict()
print(normal_dataclass_dict)
s = json.dumps(normal_dataclass_dict)
print(s)
tikhonov_a
  • 151
  • 3