6

Dataclass example:

@dataclass
class StatusElement:
    status: str
    orderindex: int
    color: str
    type: str


@dataclass
class List:
    id: int 
    statuses: List[StatusElement]

JSON example:

json = {
  "id": "124",
  "statuses": [
    {
      "status": "to do",
      "orderindex": 0,
      "color": "#d3d3d3",
      "type": "open"
    }]
}

I can unpack the JSON doing something like this:

object = List(**json)

But I'm not sure how can I also unpack the statuses into a status object and appened to the statuses list of the List object? I'm sure I need to loop over it somehow but not sure how to combine that with unpacking.

Zach Johnson
  • 2,047
  • 6
  • 24
  • 40

4 Answers4

13

Python dataclasses is a great module, but one of the things it doesn't unfortunately handle is parsing a JSON object to a nested dataclass structure.

A few workarounds exist for this:

  • You can either roll your own JSON parsing helper method, for example a from_json which converts a JSON string to an List instance with a nested dataclass.
  • You can make use of existing JSON serialization libraries. For example, pydantic is a popular one that supports this use case.

Here is an example using the dataclass-wizard library that works well enough for your use case. It's more lightweight than pydantic and coincidentally also a little faster. It also supports automatic case transforms and type conversions (for example str to annotated int)

Example below:

from dataclasses import dataclass
from typing import List as PyList

from dataclass_wizard import JSONWizard


@dataclass
class List(JSONWizard):
    id: int
    statuses: PyList['StatusElement']
    # on Python 3.9+ you can use the following syntax:
    #   statuses: list['StatusElement']


@dataclass
class StatusElement:
    status: str
    order_index: int
    color: str
    type: str


json = {
  "id": "124",
  "statuses": [
    {
      "status": "to do",
      "orderIndex": 0,
      "color": "#d3d3d3",
      "type": "open"
    }]
}


object = List.from_dict(json)

print(repr(object))
# List(id=124, statuses=[StatusElement(status='to do', order_index=0, color='#d3d3d3', type='open')])

Disclaimer: I am the creator (and maintainer) of this library.


You can now skip the class inheritance as of the latest release of dataclass-wizard. It's straightforward enough to use it; using the same example from above, but I've removed the JSONWizard usage from it completely. Just remember to ensure you don't import asdict from the dataclasses module, even though I guess that should coincidentally work.

Here's the modified version of the above without class inheritance:

from dataclasses import dataclass
from typing import List as PyList

from dataclass_wizard import fromdict, asdict


@dataclass
class List:
    id: int
    statuses: PyList['StatusElement']


@dataclass
class StatusElement:
    status: str
    order_index: int
    color: str
    type: str


json = {
  "id": "124",
  "statuses": [
    {
      "status": "to do",
      "orderIndex": 0,
      "color": "#d3d3d3",
      "type": "open"
    }]
}

# De-serialize the JSON dictionary into a `List` instance.
c = fromdict(List, json)

print(c)
# List(id=124, statuses=[StatusElement(status='to do', order_index=0, color='#d3d3d3', type='open')])

# Convert the instance back to a dictionary object that is JSON-serializable.
d = asdict(c)

print(d)
# {'id': 124, 'statuses': [{'status': 'to do', 'orderIndex': 0, 'color': '#d3d3d3', 'type': 'open'}]}

Also, here's a quick performance comparison with dacite. I wasn't aware of this library before, but it's also very easy to use (and there's also no need to inherit from any class). However, from my personal tests - Windows 10 Alienware PC using Python 3.9.1 - dataclass-wizard seemed to perform much better overall on the de-serialization process.

from dataclasses import dataclass
from timeit import timeit
from typing import List

from dacite import from_dict

from dataclass_wizard import JSONWizard, fromdict


data = {
    "id": 124,
    "statuses": [
        {
            "status": "to do",
            "orderindex": 0,
            "color": "#d3d3d3",
            "type": "open"
        }]
}


@dataclass
class StatusElement:
    status: str
    orderindex: int
    color: str
    type: str


@dataclass
class List:
    id: int
    statuses: List[StatusElement]


class ListWiz(List, JSONWizard):
    ...


n = 100_000

# 0.37
print('dataclass-wizard:            ', timeit('ListWiz.from_dict(data)', number=n, globals=globals()))

# 0.36
print('dataclass-wizard (fromdict): ', timeit('fromdict(List, data)', number=n, globals=globals()))

# 11.2
print('dacite:                      ', timeit('from_dict(List, data)', number=n, globals=globals()))


lst_wiz1 = ListWiz.from_dict(data)
lst_wiz2 = from_dict(List, data)
lst = from_dict(List, data)

# True
assert lst.__dict__ == lst_wiz1.__dict__ == lst_wiz2.__dict__
rv.kvetch
  • 9,940
  • 3
  • 24
  • 53
  • 2
    This looks really slick. I've looked at pydantic and seems a bit heavy for what I'm trying to do. I'll have to give this library a shot. Thanks! – Zach Johnson Sep 04 '21 at 22:59
  • 1
    can confirm it's really faster, almost like default one: `dacite - .040896 ms`, `dataclass_wizard - .002921 ms`, `double ** extraction in a loop - .001776 ms` – Yu Da Chi May 15 '22 at 17:47
  • 1
    Considering you added the no-inheritance support over 1.5 years ago, do we still need that part of the answer? – Camilo Terevinto Apr 26 '23 at 14:34
  • 1
    @CamiloTerevinto good point. I'll see if I can update the answer to instead include a link to the docs with no-inheritance support. – rv.kvetch Apr 27 '23 at 15:45
6

A "cleaner" solution (in my eyes). Use dacite

No need to inherit anything.

from dataclasses import dataclass
from typing import List
from dacite import from_dict

data = {
    "id": 124,
    "statuses": [
        {
            "status": "to do",
            "orderindex": 0,
            "color": "#d3d3d3",
            "type": "open"
        }]
}


@dataclass
class StatusElement:
    status: str
    orderindex: int
    color: str
    type: str


@dataclass
class List:
    id: int
    statuses: List[StatusElement]


lst: List = from_dict(List, data)
print(lst)

output

List(id=124, statuses=[StatusElement(status='to do', orderindex=0, color='#d3d3d3', type='open')])
balderman
  • 22,927
  • 7
  • 34
  • 52
  • 1
    This is also a very cool solution - I'll admit I hadn't tried `dacite` before. However, from personal tests `dacite` ended up being about **30x slower** in the de-serialization process (I might be missing an optimization step however) – rv.kvetch Sep 05 '21 at 16:23
  • 1
    But if we absolutely need to, we can also call `fromdict(data, List)` without extending from any class. Where the import is generated with `from dataclass_wizard.loaders import fromdict`. But just a note that this is technically not public API, so it might change in a future release. – rv.kvetch Sep 05 '21 at 17:27
  • 2
    Just a note, but I took the suggestion about inheritance being unnecessary to heart - the latest version of `dataclass-wizard` should now support a `fromdict` so regular data classes should work as well. I updated my answer above. – rv.kvetch Sep 06 '21 at 18:53
  • wondering why `decite` is so slow – Yu Da Chi May 15 '22 at 17:50
5

I've spent a few hours investigating options for this. There's no native Python functionality to do this, but there are a few third-party packages (writing in November 2022):

  • marshmallow_dataclass has this functionality (you need not be using marshmallow in any other capacity in your project). It gives good error messages and the package is actively maintained. I used this for a while before hitting what I believe is a bug parsing a large and complex JSON into deeply nested dataclasses, and then had to switch away.
  • dataclass-wizard is easy to use and specifically addresses this use case. It has excellent documentation. One significant disadvantage is that it won't automatically attempt to find the right fit for a given JSON, if trying to match against a union of dataclasses (see https://dataclass-wizard.readthedocs.io/en/latest/common_use_cases/dataclasses_in_union_types.html). Instead it asks you to add a "tag key" to the input JSON, which is a robust solution but may not be possible if you have no control over the input JSON.
  • dataclass-json is similar to dataclass-wizard, and again doesn't attempt to match the correct dataclass within a union.
  • dacite is the option I have settled upon for the time being. It has similar functionality to marshmallow_dataclass, at least for JSON parsing. The error messages are significantly less clear than marshmallow_dataclass, but slightly offsetting this, it's easier to figure out what's wrong if you pdb in at the point that the error occurs - the internals are quite clear and you can experiment to see what's going wrong. According to others it is rather slow, but that's not a problem in my circumstance.
Chris J Harris
  • 1,597
  • 2
  • 14
  • 26
1

One way of achieving this is to implement classmethod in each dataclass.

from dataclasses import dataclass
import inspect

@dataclass
class StatusElement:
    status: str
    orderindex: int
    color: str
    type: str

    @classmethod
    def init(cls, jsonElement):
        data = {}
        for k,v in jsonElement.items():
            if k in inspect.signature(cls).parameters:
                data[k] = v
        return cls(**data)

@dataclass
class List:
    id: int 
    statuses: list[StatusElement]

    @classmethod
    def init(cls, jsonElement):
        data = {}
        for k,v in jsonElement.items():
            if k in inspect.signature(cls).parameters:
                if k == 'statuses': data[k] = list(map(StatusElement.init,v))
                else: data[k] = v
        return cls(**data)

json = {
  "id": "124",
  "statuses": [
    {
      "status": "to do",
      "orderindex": 0,
      "color": "#d3d3d3",
      "type": "open"
    },
    {
      "status": "to do next",
      "orderindex": 1,
      "color": "#d4d4d4",
      "type": "pending"
    }]
}

object = List.init(json)
print(object)
# List(id='124', statuses=[StatusElement(status='to do', orderindex=0, color='#d3d3d3', type='open'), StatusElement(status='to do next', orderindex=1, color='#d4d4d4', type='pending')])