6

It seems there's been a fair bit of discussion about this already. I found this post particularly helpful, and it seems to provide one of the best solutions.

But there is a problem with the recommended solution.

Well, it seems to work great at first. Consider a simple test case without properties:

@dataclass
class Foo:
    x: int
>>> # Instantiate the class
>>> f = Foo(2)
>>> # Nice, it works!
>>> f.x
2

Now try to implement x as a property using the recommended solution:

@dataclass
class Foo:
    x: int
    _x: int = field(init=False, repr=False)
    
    @property
    def x(self):
        return self._x
    
    @x.setter
    def x(self, value):
        self._x = value
>>> # Instantiate while explicitly passing `x`
>>> f = Foo(2)
>>> # Still appears to work
>>> f.x
2

But wait...

>>> # Instantiate without any arguments
>>> f = Foo()
>>> # Oops...! Property `x` has never been initialized. Now we have a bug :(
>>> f.x
<property object at 0x10d2a8130>

Really the expected behavior here would be:

>>> # Instantiate without any arguments
>>> f = Foo()
TypeError: __init__() missing 1 required positional argument: 'x'

It seems that the dataclass field has been overridden by the property... any thought on how to get around this?

Related:

corvus
  • 556
  • 7
  • 18

2 Answers2

8

Using a property in a dataclass that shares the name of an argument of the __init__ method has an interesting side effect. When the class is instantiated with no argument, the property object is passed as the default.

As a work-around, you can use check the type of x in __post_init__.

@dataclass
class Foo:
    x: int
    _x: int = field(init=False, repr=False)

    def __post_init__(self):
        if isinstance(self.x, property):
            raise TypeError("__init__() missing 1 required positional argument: 'x'")

    @property
    def x(self):
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

Now when instantiating Foo, passing no argument raises the expected exception.

f = Foo()
# raises TypeError

f = Foo(1)
f
# returns
Foo(x=1)

Here is a more generalized solution for when multiple properties are being used. This uses InitVar to pass parameters to the __post_init__ method. It DOES require that the the properties are listed first, and that their respective storage attributes be a the same name with a leading underscore.

This is pretty hacky, and the properties no longer show up in the repr.

@dataclass
class Foo:
    x: InitVar[int]
    y: InitVar[int]
    _x: int = field(init=False, repr=False, default=None)
    _y: int = field(init=False, repr=False, default=None)

    def __post_init__(self, *args):
        if m := sum(isinstance(arg, property) for arg in args):
            s = 's' if m>1 else ''
            raise TypeError(f'__init__() missing {m} required positional argument{s}.')

        arg_names = inspect.getfullargspec(self.__class__).args[1:]
        for arg_name, val in zip(arg_names, args):
            self.__setattr__('_' + arg_name, val)

    @property
    def x(self):
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @property
    def y(self):
        return self._y

    @y.setter
    def y(self, value):
        self._y = value
James
  • 32,991
  • 4
  • 47
  • 70
  • I like that it is simple, but it feels quite hacky. Can you show a more general solution in case there are multiple property-fields without default values? Do we need to hardcode them into the post-init check? – corvus Nov 05 '21 at 14:36
  • @James the latter approach is a good idea, but it has a few flaws, at least from testing. For example if you define a dataclass field at the top like `a: str` not associated with a property, the `inspect.getfullargspec` call returns a 3-element tuple of [a, x, y] rather than the expected [x, y]. – rv.kvetch Nov 06 '21 at 17:10
  • One other important thing I noted, is that the `__post_init__` sets the internal attributes and doesn't go through the property setter. At least, i feel it's a good idea to go through the property here, as it generally might have some validation logic in the setter method. – rv.kvetch Nov 06 '21 at 17:13
3

Using properties in dataclasses actually has a curious effect, as @James also pointed out. In actuality, this issue isn't constrained to dataclasses alone; it rather happens due to the order in which you declare (or re-declare) a variable.

To elaborate, consider what happens when you do something like this, using just a simple class:

class Foo:
    x: int = 2

    @property
    def x(self):
        return self._x

But watch what happens when you now do:

>>> Foo.x
<property object at 0x00000263C50ECC78>

So what happened? Clearly, the property method declaration overwrote the attribute that we declared as x: int = 2.

In fact, at the time that the @dataclass decorator runs (which is once the class definition of Foo is complete), this is actually what it sees as the definition of x:

x: int = <property object at 0x00000263C50ECC78>

Confusing, right? It still sees the class annotations that are present in Foo.__annotations__, but it also sees the property object with a getter that we declared after the dataclass field. It's important to note that this result is not a bug in any way; however, since dataclasses doesn't explicitly check for a property object, it treats the value after the assignment = operator as a default value, and thus we observe a <property object at 0x00000263C50ECC78> passed in as a default value to the constructor when we don't explicitly pass a value for the field property x.

This is actually quite an interesting consequence to keep in mind. In fact, I also came up with a section on Using Field Properties which actually goes over this same behavior and some unexpected consequences of it.

Properties with Required Values

Here's a generalized metaclass approach that might prove useful for automation purposes, assuming what you want to do is raise a TypeError when values for any field properties are not passed in the constructor. I also created an optimized, modified approach of it in a public gist.

What this metaclass does is generate a __post_init__() for the class, and for each field property declared it checks if a property object has been set as a default in the __init__() method generated by the @dataclass decorator; this indicates no value was passed in to the constructor for the field property, so a properly formatted TypeError is then raised to the caller. I adapted this metaclass approach from @James's answer above.

Note: The following example should work in Python 3.7+

from __future__ import annotations

from collections import deque
# noinspection PyProtectedMember
from dataclasses import _create_fn
from logging import getLogger

log = getLogger(__name__)


def require_field_properties(name, bases=None, cls_dict=None) -> type:
    """
    A metaclass which ensures that values for field properties are passed in
    to the __init__() method.

    Accepts the same arguments as the builtin `type` function::

        type(name, bases, dict) -> a new type
    """

    # annotations can also be forward-declared, i.e. as a string
    cls_annotations: dict[str, type | str] = cls_dict['__annotations__']
    # we're going to be doing a lot of `append`s, so might be better to use a
    # deque here rather than a list.
    body_lines: deque[str] = deque()

    # Loop over and identify all dataclass fields with associated properties.
    # Note that dataclasses._create_fn() uses 2 spaces for the initial indent.
    for field, annotation in cls_annotations.items():
        if field in cls_dict and isinstance(cls_dict[field], property):
            body_lines.append(f'if isinstance(self.{field}, property):')
            body_lines.append(f"  missing_fields.append('{field}')")

    # only add a __post_init__() if there are field properties in the class
    if not body_lines:
        cls = type(name, bases, cls_dict)
        return cls

    body_lines.appendleft('missing_fields = []')
    # to check if there are any missing arguments for field properties
    body_lines.append('if missing_fields:')
    body_lines.append("  s = 's' if len(missing_fields) > 1 else ''")
    body_lines.append("  args = (', and' if len(missing_fields) > 2 else ' and')"
                      ".join(', '.join(map(repr, missing_fields)).rsplit(',', 1))")
    body_lines.append('  raise TypeError('
                      "f'__init__() missing {len(missing_fields)} required "
                      "positional argument{s}: {args}')")

    # does the class define a __post_init__() ?
    if '__post_init__' in cls_dict:
        fn_locals = {'_orig_post_init': cls_dict['__post_init__']}
        body_lines.append('_orig_post_init(self, *args)')
    else:
        fn_locals = None

    # generate a new __post_init__ method
    _post_init_fn = _create_fn('__post_init__',
                               ('self', '*args'),
                               body_lines,
                               globals=cls_dict,
                               locals=fn_locals,
                               return_type=None)

    # Set the __post_init__() attribute on the class
    cls_dict['__post_init__'] = _post_init_fn

    # (Optional) Print the body of the generated method definition
    log.debug('Generated a body definition for %s.__post_init__():',
              name)
    log.debug('%s\n  %s', '-------', '\n  '.join(body_lines))
    log.debug('-------')

    cls = type(name, bases, cls_dict)
    return cls

And a sample usage of the metaclass:

from dataclasses import dataclass, field
from logging import basicConfig

from metaclasses import require_field_properties

basicConfig(level='DEBUG')


@dataclass
class Foo(metaclass=require_field_properties):
    a: str
    x: int
    y: bool
    z: float
    # the following definitions are not needed
    _x: int = field(init=False, repr=False)
    _y: bool = field(init=False, repr=False)
    _z: float = field(init=False, repr=False)

    @property
    def x(self):
        return self._x

    @x.setter
    def x(self, value):
        print(f'Setting x: {value!r}')
        self._x = value

    @property
    def y(self):
        return self._y

    @y.setter
    def y(self, value):
        print(f'Setting y: {value!r}')
        self._y = value

    @property
    def z(self):
        return self._z

    @z.setter
    def z(self, value):
        print(f'Setting z: {value!r}')
        self._z = value


if __name__ == '__main__':
    foo1 = Foo(a='a value', x=1, y=True, z=2.3)
    print('Foo1:', foo1)
    print()
    foo2 = Foo('hello', 123)
    print('Foo2:', foo2)

Output now appears to be as desired:

DEBUG:metaclasses:Generated a body definition for Foo.__post_init__():
DEBUG:metaclasses:-------
  missing_fields = []
  if isinstance(self.x, property):
    missing_fields.append('x')
  if isinstance(self.y, property):
    missing_fields.append('y')
  if isinstance(self.z, property):
    missing_fields.append('z')
  if missing_fields:
    s = 's' if len(missing_fields) > 1 else ''
    args = (', and' if len(missing_fields) > 2 else ' and').join(', '.join(map(repr, missing_fields)).rsplit(',', 1))
    raise TypeError(f'__init__() missing {len(missing_fields)} required positional argument{s}: {args}')
DEBUG:metaclasses:-------

Setting x: 1
Setting y: True
Setting z: 2.3
Foo1: Foo(a='a value', x=1, y=True, z=2.3)

Setting x: 123
Setting y: <property object at 0x10c2c2350>
Setting z: <property object at 0x10c2c23b0>

Traceback (most recent call last):
  ...
    foo2 = Foo('hello', 123)
  File "<string>", line 7, in __init__
  File "<string>", line 13, in __post_init__
TypeError: __init__() missing 2 required positional arguments: 'y' and 'z'

So the above solution does work as expected, however it's a lot of code and so it's worth asking: why not make it less code, and rather set the __post_init__ in the class itself, rather than go through a metaclass? The core reason here is actually performance. You'd ideally want to minimize the overhead of creating a new Foo object in the above case, for example.

So in order to explore that a bit further, I've put together a small test case to compare the performance of a metaclass approach against a __post_init__ approach using the inspect module to retrieve the field properties of the class at runtime. Here is the example code below:

import inspect
from dataclasses import dataclass, InitVar

from metaclasses import require_field_properties


@dataclass
class Foo1(metaclass=require_field_properties):
    a: str
    x: int
    y: bool
    z: float

    @property
    def x(self):
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @property
    def y(self):
        return self._y

    @y.setter
    def y(self, value):
        self._y = value

    @property
    def z(self):
        return self._z

    @z.setter
    def z(self, value):
        self._z = value


@dataclass
class Foo2:
    a: str
    x: InitVar[int]
    y: InitVar[bool]
    z: InitVar[float]

    # noinspection PyDataclass
    def __post_init__(self, *args):
        if m := sum(isinstance(arg, property) for arg in args):
            s = 's' if m > 1 else ''
            raise TypeError(f'__init__() missing {m} required positional argument{s}.')

        arg_names = inspect.getfullargspec(self.__class__).args[2:]
        for arg_name, val in zip(arg_names, args):
            # setattr calls the property defined for each field
            self.__setattr__(arg_name, val)

    @property
    def x(self):
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @property
    def y(self):
        return self._y

    @y.setter
    def y(self, value):
        self._y = value

    @property
    def z(self):
        return self._z

    @z.setter
    def z(self, value):
        self._z = value


if __name__ == '__main__':
    from timeit import timeit

    n = 1
    iterations = 1000

    print('Metaclass:  ', timeit(f"""
for i in range({iterations}):
    _ = Foo1(a='a value' * i, x=i, y=i % 2 == 0, z=i * 1.5)
""", globals=globals(), number=n))

    print('InitVar:    ', timeit(f"""
for i in range({iterations}):
    _ = Foo2(a='a value' * i, x=i, y=i % 2 == 0, z=i * 1.5)
""", globals=globals(), number=n))

And here are the results, when I test in a Python 3.9 environment with N=1000 iterations, with Mac OS X (Big Sur):

Metaclass:   0.0024892739999999997
InitVar:     0.034604513

Not surprisingly, the metaclass approach is overall more efficient when creating multiple Foo objects - on average about 10x faster. The reason for this is it only has to go through and determine the field properties defined in a class once, and then it actually generates a __post_init__ specifically for those fields. Overall the result is that it performs better, even though it technically requires more code and setup in order to get there.

Properties with Default Values

Suppose that you instead don't want to raise an error when x is not explicitly passed in to the constructor; maybe you just want to set a default value, like None or an int value like 3 for example.

I've created a metaclass approach specifically designed to handle this scenario. There's also the original gist you can check out if you want an idea of how it was implemented (or you can also check out the source code directly if you're curious as well). In any case, here's the solution that I've come up with below; note that it involves a third-party library, as unfortunately this behavior is not baked into the dataclasses module at present.

from __future__ import annotations

from dataclasses import dataclass, field

from dataclass_wizard import property_wizard


@dataclass
class Foo(metaclass=property_wizard):
    x: int | None
    _x: int = field(init=False, repr=False)  # technically, not needed

    @property
    def x(self):
        return self._x

    @x.setter
    def x(self, value):
        print(f'Setting x to: {value!r}')
        self._x = value


if __name__ == '__main__':
    f = Foo(2)
    assert f.x == 2

    f = Foo()
    assert f.x is None

This is the output with the metaclass approach:

Setting x to: 2
Setting x to: None

And the output with the @dataclass decorator alone - also as observed in the question above:

Setting x to: 2
Setting x to: <property object at 0x000002D65A9950E8>

Traceback (most recent call last):
  ...
    assert f.x is None
AssertionError

Specifying a Default Value

Lastly, here's an example of setting an explicit default value for the property, using a property defined with a leading underscore _ to distinguish it from the dataclass field which has a public name.

from dataclasses import dataclass

from dataclass_wizard import property_wizard


@dataclass
class Foo(metaclass=property_wizard):
    x: int = 1

    @property
    def _x(self):
        return self._x

    @_x.setter
    def _x(self, value):
        print(f'Setting x to: {value!r}')
        self._x = value


if __name__ == '__main__':
    f = Foo(2)
    assert f.x == 2

    f = Foo()
    assert f.x == 1

Output:

Setting x to: 2
Setting x to: 1
rv.kvetch
  • 9,940
  • 3
  • 24
  • 53
  • 1
    +1 for clarifying that the variable name is simply being overwritten by the property definition. I understood this, but of course it seems quite obvious when you consider that the same name is being assigned to twice in the same scope. I guess the only way is to move the property creation to a different scope, e.g. outside the class (as suggested by morlind in the comments to the linked article), or else into `__post_init__`. I actually wonder if this last option may not make the most sense. – corvus Nov 05 '21 at 14:28
  • @corvus I actually don't prefer moving the property creation outside the class, for the same reasons I outlined in the linked article. For me the main issue is that it's easy to forget to add it if you're adding another field property to a class for example. I much rather prefer the `__post_init__` as mentioned (also the approach suggested by @James here) because it's contained within the class, and since it's near the top of the class definition it's easy to change or modify it as needed. The downside is I guess you'd need to do this manually for each new property you plan to add. – rv.kvetch Nov 05 '21 at 22:32
  • @corvus I've added a more general metaclass approach that can be used to automate the checking if a value is passed for field properties. It looks to be working so far, but probably worth making some improvements. I'll leave it as it is for now. – rv.kvetch Nov 06 '21 at 02:38