Using properties in dataclasses
actually has a curious effect, as @James also pointed out. In actuality, this issue isn't constrained to dataclasses alone; it rather happens due to the order in which you declare (or re-declare) a variable.
To elaborate, consider what happens when you do something like this, using just a simple class:
class Foo:
x: int = 2
@property
def x(self):
return self._x
But watch what happens when you now do:
>>> Foo.x
<property object at 0x00000263C50ECC78>
So what happened? Clearly, the property
method declaration overwrote the attribute that we declared as x: int = 2
.
In fact, at the time that the @dataclass
decorator runs (which is once the class definition of Foo
is complete), this is actually what it sees as the definition of x
:
x: int = <property object at 0x00000263C50ECC78>
Confusing, right? It still sees the class annotations that are present in Foo.__annotations__
, but it also sees the property
object with a getter that we declared after the dataclass field. It's important to note that this result is not a bug in any way; however, since dataclasses
doesn't explicitly check for a property
object, it treats the value after the assignment =
operator as a default value, and thus we observe a <property object at 0x00000263C50ECC78>
passed in as a default value to the constructor when we don't explicitly pass a value for the field property x
.
This is actually quite an interesting consequence to keep in mind. In fact, I also came up with a section on Using Field Properties which actually goes over this same behavior and some unexpected consequences of it.
Properties with Required Values
Here's a generalized metaclass approach that might prove useful for automation purposes, assuming what you want to do is raise a TypeError
when values for any field properties are not passed in the constructor. I also created an optimized, modified approach of it in a public gist.
What this metaclass does is generate a __post_init__()
for the class, and for each field property declared it checks if a property
object has been set as a default in the __init__()
method generated by the @dataclass
decorator; this indicates no value was passed in to the constructor for the field property, so a properly formatted TypeError
is then raised to the caller. I adapted this metaclass approach from @James's answer above.
Note: The following example should work in Python 3.7+
from __future__ import annotations
from collections import deque
# noinspection PyProtectedMember
from dataclasses import _create_fn
from logging import getLogger
log = getLogger(__name__)
def require_field_properties(name, bases=None, cls_dict=None) -> type:
"""
A metaclass which ensures that values for field properties are passed in
to the __init__() method.
Accepts the same arguments as the builtin `type` function::
type(name, bases, dict) -> a new type
"""
# annotations can also be forward-declared, i.e. as a string
cls_annotations: dict[str, type | str] = cls_dict['__annotations__']
# we're going to be doing a lot of `append`s, so might be better to use a
# deque here rather than a list.
body_lines: deque[str] = deque()
# Loop over and identify all dataclass fields with associated properties.
# Note that dataclasses._create_fn() uses 2 spaces for the initial indent.
for field, annotation in cls_annotations.items():
if field in cls_dict and isinstance(cls_dict[field], property):
body_lines.append(f'if isinstance(self.{field}, property):')
body_lines.append(f" missing_fields.append('{field}')")
# only add a __post_init__() if there are field properties in the class
if not body_lines:
cls = type(name, bases, cls_dict)
return cls
body_lines.appendleft('missing_fields = []')
# to check if there are any missing arguments for field properties
body_lines.append('if missing_fields:')
body_lines.append(" s = 's' if len(missing_fields) > 1 else ''")
body_lines.append(" args = (', and' if len(missing_fields) > 2 else ' and')"
".join(', '.join(map(repr, missing_fields)).rsplit(',', 1))")
body_lines.append(' raise TypeError('
"f'__init__() missing {len(missing_fields)} required "
"positional argument{s}: {args}')")
# does the class define a __post_init__() ?
if '__post_init__' in cls_dict:
fn_locals = {'_orig_post_init': cls_dict['__post_init__']}
body_lines.append('_orig_post_init(self, *args)')
else:
fn_locals = None
# generate a new __post_init__ method
_post_init_fn = _create_fn('__post_init__',
('self', '*args'),
body_lines,
globals=cls_dict,
locals=fn_locals,
return_type=None)
# Set the __post_init__() attribute on the class
cls_dict['__post_init__'] = _post_init_fn
# (Optional) Print the body of the generated method definition
log.debug('Generated a body definition for %s.__post_init__():',
name)
log.debug('%s\n %s', '-------', '\n '.join(body_lines))
log.debug('-------')
cls = type(name, bases, cls_dict)
return cls
And a sample usage of the metaclass:
from dataclasses import dataclass, field
from logging import basicConfig
from metaclasses import require_field_properties
basicConfig(level='DEBUG')
@dataclass
class Foo(metaclass=require_field_properties):
a: str
x: int
y: bool
z: float
# the following definitions are not needed
_x: int = field(init=False, repr=False)
_y: bool = field(init=False, repr=False)
_z: float = field(init=False, repr=False)
@property
def x(self):
return self._x
@x.setter
def x(self, value):
print(f'Setting x: {value!r}')
self._x = value
@property
def y(self):
return self._y
@y.setter
def y(self, value):
print(f'Setting y: {value!r}')
self._y = value
@property
def z(self):
return self._z
@z.setter
def z(self, value):
print(f'Setting z: {value!r}')
self._z = value
if __name__ == '__main__':
foo1 = Foo(a='a value', x=1, y=True, z=2.3)
print('Foo1:', foo1)
print()
foo2 = Foo('hello', 123)
print('Foo2:', foo2)
Output now appears to be as desired:
DEBUG:metaclasses:Generated a body definition for Foo.__post_init__():
DEBUG:metaclasses:-------
missing_fields = []
if isinstance(self.x, property):
missing_fields.append('x')
if isinstance(self.y, property):
missing_fields.append('y')
if isinstance(self.z, property):
missing_fields.append('z')
if missing_fields:
s = 's' if len(missing_fields) > 1 else ''
args = (', and' if len(missing_fields) > 2 else ' and').join(', '.join(map(repr, missing_fields)).rsplit(',', 1))
raise TypeError(f'__init__() missing {len(missing_fields)} required positional argument{s}: {args}')
DEBUG:metaclasses:-------
Setting x: 1
Setting y: True
Setting z: 2.3
Foo1: Foo(a='a value', x=1, y=True, z=2.3)
Setting x: 123
Setting y: <property object at 0x10c2c2350>
Setting z: <property object at 0x10c2c23b0>
Traceback (most recent call last):
...
foo2 = Foo('hello', 123)
File "<string>", line 7, in __init__
File "<string>", line 13, in __post_init__
TypeError: __init__() missing 2 required positional arguments: 'y' and 'z'
So the above solution does work as expected, however it's a lot of code and so it's worth asking: why not make it less code, and rather set the __post_init__
in the class itself, rather than go through a metaclass? The core reason here is actually performance. You'd ideally want to minimize the overhead of creating a new Foo
object in the above case, for example.
So in order to explore that a bit further, I've put together a small test case to compare the performance of a metaclass approach against a __post_init__
approach using the inspect
module to retrieve the field properties of the class at runtime. Here is the example code below:
import inspect
from dataclasses import dataclass, InitVar
from metaclasses import require_field_properties
@dataclass
class Foo1(metaclass=require_field_properties):
a: str
x: int
y: bool
z: float
@property
def x(self):
return self._x
@x.setter
def x(self, value):
self._x = value
@property
def y(self):
return self._y
@y.setter
def y(self, value):
self._y = value
@property
def z(self):
return self._z
@z.setter
def z(self, value):
self._z = value
@dataclass
class Foo2:
a: str
x: InitVar[int]
y: InitVar[bool]
z: InitVar[float]
# noinspection PyDataclass
def __post_init__(self, *args):
if m := sum(isinstance(arg, property) for arg in args):
s = 's' if m > 1 else ''
raise TypeError(f'__init__() missing {m} required positional argument{s}.')
arg_names = inspect.getfullargspec(self.__class__).args[2:]
for arg_name, val in zip(arg_names, args):
# setattr calls the property defined for each field
self.__setattr__(arg_name, val)
@property
def x(self):
return self._x
@x.setter
def x(self, value):
self._x = value
@property
def y(self):
return self._y
@y.setter
def y(self, value):
self._y = value
@property
def z(self):
return self._z
@z.setter
def z(self, value):
self._z = value
if __name__ == '__main__':
from timeit import timeit
n = 1
iterations = 1000
print('Metaclass: ', timeit(f"""
for i in range({iterations}):
_ = Foo1(a='a value' * i, x=i, y=i % 2 == 0, z=i * 1.5)
""", globals=globals(), number=n))
print('InitVar: ', timeit(f"""
for i in range({iterations}):
_ = Foo2(a='a value' * i, x=i, y=i % 2 == 0, z=i * 1.5)
""", globals=globals(), number=n))
And here are the results, when I test in a Python 3.9 environment with N=1000
iterations, with Mac OS X (Big Sur):
Metaclass: 0.0024892739999999997
InitVar: 0.034604513
Not surprisingly, the metaclass approach is overall more efficient when creating multiple Foo
objects - on average about 10x faster. The reason for this is it only has to go through and determine the field properties defined in a class once, and then it actually generates a __post_init__
specifically for those fields. Overall the result is that it performs better, even though it technically requires more code and setup in order to get there.
Properties with Default Values
Suppose that you instead don't want to raise an error when x
is not explicitly passed in to the constructor; maybe you just want to set a default value, like None
or an int
value like 3 for example.
I've created a metaclass approach specifically designed to handle this scenario. There's also the original gist you can check out if you want an idea of how it was implemented (or you can also check out the source code directly if you're curious as well). In any case, here's the solution that I've come up with below; note that it involves a third-party library, as unfortunately this behavior is not baked into the dataclasses
module at present.
from __future__ import annotations
from dataclasses import dataclass, field
from dataclass_wizard import property_wizard
@dataclass
class Foo(metaclass=property_wizard):
x: int | None
_x: int = field(init=False, repr=False) # technically, not needed
@property
def x(self):
return self._x
@x.setter
def x(self, value):
print(f'Setting x to: {value!r}')
self._x = value
if __name__ == '__main__':
f = Foo(2)
assert f.x == 2
f = Foo()
assert f.x is None
This is the output with the metaclass approach:
Setting x to: 2
Setting x to: None
And the output with the @dataclass
decorator alone - also as observed in the question above:
Setting x to: 2
Setting x to: <property object at 0x000002D65A9950E8>
Traceback (most recent call last):
...
assert f.x is None
AssertionError
Specifying a Default Value
Lastly, here's an example of setting an explicit default value for the property, using a property defined with a leading underscore _
to distinguish it from the dataclass field which has a public name.
from dataclasses import dataclass
from dataclass_wizard import property_wizard
@dataclass
class Foo(metaclass=property_wizard):
x: int = 1
@property
def _x(self):
return self._x
@_x.setter
def _x(self, value):
print(f'Setting x to: {value!r}')
self._x = value
if __name__ == '__main__':
f = Foo(2)
assert f.x == 2
f = Foo()
assert f.x == 1
Output:
Setting x to: 2
Setting x to: 1