5

Consider this json file named h.json I want to convert this into a python dataclass.

{
    "acc1":{
        "email":"acc1@example.com",
        "password":"acc1",
        "name":"ACC1",
        "salary":1
    },
    "acc2":{
        "email":"acc2@example.com",
        "password":"acc2",
        "name":"ACC2",
        "salary":2
    }

}

I could use an alternative constructor for getting each account, for example:

import json
from dataclasses import dataclass

@dataclass
class Account(object):
    email:str
    password:str
    name:str
    salary:int
    
    @classmethod
    def from_json(cls, json_key):
        file = json.load(open("h.json"))
        return cls(**file[json_key])

but this is limited to what arguments (email, name, etc.) were defined in the dataclass.

What if I were to modify the json to include another thing, say age? The script would end up returning a TypeError, specifically TypeError: __init__() got an unexpected keyword argument 'age'.

Is there a way to dynamically adjust the class attributes based on the keys of the dict (json object), so that I don't have to add attributes each time I add a new key to the json?

MatthewMartin
  • 32,326
  • 33
  • 105
  • 164
Kanishk
  • 258
  • 1
  • 3
  • 9
  • 5
    For such flexibility, it's better to keep the data as a dict instead of trying to fit it to a class. – rdas Oct 29 '21 at 18:50
  • 2
    The point of a dataclass is that it keeps you from defining new fields like this. If you want to dynamically change what fields can be defined, you can use a class. – Nick ODell Oct 29 '21 at 18:51

3 Answers3

8

Since it sounds like your data might be expected to be dynamic and you want the freedom to add more fields in the JSON object without reflecting the same changes in the model, I'd also suggest to check out typing.TypedDict instead a dataclass.

Here's an example with TypedDict, which should work in Python 3.7+. Since TypedDict was introduced in 3.8, I've instead imported it from typing_extensions so it's compatible with 3.7 code.

from __future__ import annotations

import json
from io import StringIO
from typing_extensions import TypedDict


class Account(TypedDict):
    email: str
    password: str
    name: str
    salary: int


json_data = StringIO("""{
    "acc1":{
        "email":"acc1@example.com",
        "password":"acc1",
        "name":"ACC1",
        "salary":1
    },
    "acc2":{
        "email":"acc2@example.com",
        "password":"acc2",
        "name":"ACC2",
        "salary":2,
        "someRandomKey": "string"
    }
}
""")

data = json.load(json_data)
name_to_account: dict[str, Account] = data

acct = name_to_account['acc2']

# Your IDE should be able to offer auto-complete suggestions within the
# brackets, when you start typing or press 'Ctrl + Space' for example.
print(acct['someRandomKey'])

If you are set on using dataclasses to model your data, I'd suggest checking out a JSON serialization library like the dataclass-wizard (disclaimer: I am the creator) which should handle extraneous fields in the JSON data as mentioned, as well as a nested dataclass model if you find your data becoming more complex.

It also has a handy tool that you can use to generate a dataclass schema from JSON data, which can be useful for instance if you want to update your model class whenever you add new fields in the JSON file as mentioned.

rv.kvetch
  • 9,940
  • 3
  • 24
  • 53
  • wow, `TypedDict` !!! very good idea – PersianMan Oct 30 '21 at 04:42
  • 1
    yep, definitely agree, it's a cool but I feel a rather not well-known feature of `typing` :-) – rv.kvetch Oct 30 '21 at 04:43
  • @rv.kvetch, this something which will definitely come in handy, thanks for letting me know, but for my specific use case, I have many other methods in the `Account` class besides the alt constructor, and inheriting from `TypedDict` limits to only using annotations inside a class, also I don't get type hints for `"someRandomKey" which is understood as I haven't, specified that field in the class. Thanks for letting me know this. – Kanishk Oct 30 '21 at 07:26
  • 1
    Ah, that definitely makes sense. Yep agreed, one limitation of `TypedDict` is you can't define and use methods as you normally would. If you *are* still set on using dataclasses, I'd suggest checking out the linked library above as it has a CLI tool you can use to convert a JSON schema to a dataclass model, which can potentially be used if you add a bunch of new JSON fields. It is actually inspired in part by the other excellent tool here: https://russbiggs.github.io/json2dataclass/ – rv.kvetch Oct 30 '21 at 22:13
  • 1
    Insanely awesome. Works really well with a JSON dict list. {"mystr": [{"mystr2": {mystr3: -999}, more dictionaries...}]}. json_data_to_class = dict[list, Myclass]= _json_data. In other words, looks like this technique works many JSON formats. – zerocog Jul 09 '23 at 01:57
4

This way you lose some dataclass features.

  • Such as determining whether it is optional or not
  • Such as auto-completion feature

However, you are more familiar with your project and decide accordingly

There must be many methods, but this is one of them:

@dataclass
class Account(object):
    email: str
    password: str
    name: str
    salary: int

    @classmethod
    def from_json(cls, json_key):
        file = json.load(open("1.txt"))
        keys = [f.name for f in fields(cls)]
        # or: keys = cls.__dataclass_fields__.keys()
        json_data = file[json_key]
        normal_json_data = {key: json_data[key] for key in json_data if key in keys}
        anormal_json_data = {key: json_data[key] for key in json_data if key not in keys}
        tmp = cls(**normal_json_data)
        for anormal_key in anormal_json_data:
            setattr(tmp,anormal_key,anormal_json_data[anormal_key])
        return tmp

test = Account.from_json("acc1")
print(test.age)
PersianMan
  • 924
  • 1
  • 12
  • 29
  • 1
    The `__dataclass_fields__` attribute is internal to the dataclasses module and could change at any time; you should prefer to use `dataclasses.fields` here instead (which *is* documented). – rv.kvetch Oct 29 '21 at 22:29
  • @rv.kvetch tanks , but for this usage is not matter – PersianMan Oct 30 '21 at 04:34
  • well, it would certainly matter if the name were ever changed slightly - for example to `__fields__`. The point is that you can't rely on internal attributes because they might change in a future revision. – rv.kvetch Oct 30 '21 at 04:37
  • 2
    @rv.kvetch yes, its true, tanks – PersianMan Oct 30 '21 at 04:41
  • @PersianMan, this is what I was looking for, thanks, apparently I didn't know about the `setattr()` function, . The caveats for using this (you gave 2 above), could you explain the first one (sorry, am new to python), the 2nd one most probably is IDE thing, I'm not concerned with that at the moment, thanks for your time (also, any other things you'd wanna point out?) – Kanishk Oct 30 '21 at 07:31
  • 1
    @rv.kvetch now you cant make new `Account` object without password or email or... because those are require field, but `age` is optional now , for example if you want change `email` as optional field (means: can make new `Account` without pass `email` field in argument ) you must change line 3 with: `email: Optional[str]` – PersianMan Oct 30 '21 at 09:24
  • @PersianMan I think you probably meant to mention the OP above, right? – rv.kvetch Oct 30 '21 at 22:10
  • 1
    yes @Kanishk, sory – PersianMan Oct 31 '21 at 06:40
3

For a flat (not nested dataclass) the code below does the job.
If you need to handle nested dataclasses you should use a framework like dacite.
Note 1 that loading the data from the json file should not be part of your class logic.

Note 2 If your json can contain anything - you can not map it to a dataclass and you should have to work with a dict

from dataclasses import dataclass
from typing import List

data = {
    "acc1":{
        "email":"acc1@example.com",
        "password":"acc1",
        "name":"ACC1",
        "salary":1
    },
    "acc2":{
        "email":"acc2@example.com",
        "password":"acc2",
        "name":"ACC2",
        "salary":2
    }

}



@dataclass
class Account:
    email:str
    password:str
    name:str
    salary:int

accounts: List[Account] = [Account(**x) for x in data.values()]
print(accounts)

output

[Account(email='acc1@example.com', password='acc1', name='ACC1', salary=1), Account(email='acc2@example.com', password='acc2', name='ACC2', salary=2)]
balderman
  • 22,927
  • 7
  • 34
  • 52