2

I wish to use dataclasses in Python to create a base class and several derived classes. These classes will contains complex attributes, such as dictionaries. I want the derived classes to change only part of the dictionary defined by the base class, is this possible? Or am I better off with plain old classes?
Shown in the code snippet is the current situation, this seems wasteful in terms of code duplication.
In this example I could define a function that accepts a single parameter instead of the lambdas, but in a real world example I would have to define a function for every such case and that would be cumbersome.

from dataclasses import dataclass, field


@dataclass
class BaseDataClass:
    simple_field_one: int = 100
    simple_field_two: int = 200
    complex_field: dict = field(default_factory=lambda: {
        'x': 0.1,
        'y': ['a', 'b']
    })


@dataclass
class DerivedDataClass(BaseDataClass):
    simple_field_two: int = 300  # this is easy
    complex_field: dict = field(default_factory=lambda: {
        'x': 0.1,
        'y': ['a', 'c']
    })  # this is wasteful. All I changed was complex_field['y'][1]
erap129
  • 910
  • 1
  • 8
  • 17

2 Answers2

1

I use dataclasses this way quite extensively, and it seems to work quite well.

One difference I made, however, is to make the complex fields their own dataclasses (see Python nested dataclasses ...is this valid?).

You might want to consider that approach and see how it may help you cut down some of the verbosity you're seeing.

Richard
  • 3,024
  • 2
  • 17
  • 40
1

This might be obvious, but if the change is very small it could be convenient to use __post_init__ to apply it instead of redefining the field:

from dataclasses import dataclass, field


@dataclass
class BaseDataClass:
    simple_field_one: int = 100
    simple_field_two: int = 200
    complex_field: dict = field(default_factory=lambda: {
        'x': 0.1,
        'y': ['a', 'b']
    })


@dataclass
class DerivedDataClass(BaseDataClass):
    simple_field_two: int = 300

    def __post_init__(self):
        self.complex_field['y'][1] = 'c'

Slightly different alternative, in case you want to be able to control the update to complex_field during initialization:

from dataclasses import dataclass, field, InitVar

...

@dataclass
class DerivedDataClass(BaseDataClass):
    simple_field_two: int = 300
    # having a mutable default is fine here, since its reference isn't kept around
    # and we don't change it during post_init
    complex_update: InitVar[dict] = {'y': ['a', 'c']}

    def __post_init__(self, complex_update):
        self.complex_field.update(complex_update)
Arne
  • 17,706
  • 5
  • 83
  • 99