5

I have a situation where I would like to be able to treat a frozen dataclass instance as always having the latest data. Or in other words, I'd like to be able to detect if a dataclass instance has had replace called on it and throw an exception. It should also only apply to that particular instance, so that creation/replacements of other dataclass instances of the same type do not affect each other.

Here is some sample code:

from dataclasses import dataclass, replace

@dataclass(frozen=True)
class AlwaysFreshData:
    fresh_data: str


def attempt_to_read_stale_data():
    original = AlwaysFreshData(fresh_data="fresh")
    unaffected = AlwaysFreshData(fresh_data="not affected")

    print(original.fresh_data)

    new = replace(original, fresh_data="even fresher")

    print(original.fresh_data) # I want this to trigger an exception now

    print(new.fresh_data)

The idea here is to prevent both accidental mutation and stale reads from our dataclass objects to prevent bugs.

Is it possible to to do this? Either through a base class or some other method?

EDIT: The intention here is to have a way of enforcing/verifying "ownership" semantics for dataclasses, even if it is only during runtime.

Here is a concrete example of a situation with regular dataclasses that is problematic.

@dataclass
class MutableData:
    my_string: str

def sneaky_modify_data(data: MutableData) -> None:
    some_side_effect(data)
    data.my_string = "something else" # Sneaky string modification

x = MutableData(my_string="hello")

sneaky_modify_data(x)

assert x.my_string == "hello" # as a caller of 'sneaky_modify_data', I don't expect that x.my_string would have changed!

This can be prevented by using frozen dataclasses! But then there is still a situation that can lead to potential bugs, as demonstrated below.

@dataclass(frozen=True)
class FrozenData:
    my_string: str

def modify_frozen_data(data: FrozenData) -> FrozenData:
   some_side_effect(data)
   return replace(data, my_string="something else")

x = FrozenData(my_string="hello")

y = modify_frozen_data(x)

some_other_function(x) # AHH! I probably wanted to use y here instead, since it was modified!

In summary, I want the ability to prevent sneaky or unknown modifications to data, while also forcing invalidation of data that has been replaced. This prevents the ability to accidentally use data that is out-of-date.

This situation might be familiar to some as being similar to the ownership semantics in something like Rust.

As for my specific situation, I already have a large amount of code that uses these semantics, except with NamedTuple instances instead. This works, because modifying the _replace function on any instance allows the ability to invalidate instances. This same strategy doesn't work as cleanly for dataclasses as dataclasses.replace is not a function on the instances themselves.

  • It seems like what you actually want is a *non*-frozen dataclass, so you can update the value of `fresh_data`. – jonrsharpe Jul 01 '20 at 13:50
  • 2
    This seems to be an [X-Y problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). You are modifying data (using `replace`), but you want data to be frozen, so you are creating a new instance. At the same time, you don't want the old instance to be used, because it is no longer valid. Why not just make the object non-frozen as @jonrsharpe suggested? Then you won't have any of the problems you are now trying to solve. Explain you initial problem X, rather than asking for a solution for Y. – zvone Jul 03 '20 at 13:00
  • @zvone to be fair, the main purpose of `replace` is to be called with frozen dataclasses. It is good to point out the the problem hints to a design issue, but OP's circumstances, or anyone's who finds this post through google, might make a re-design infeasible. – Arne Jul 03 '20 at 14:18
  • @zvone I've updated my question with more detailed reasoning as to why I'm looking for this particular solution and also why non-frozen dataclasses are not suitable. – Sanchit Uttam Jul 03 '20 at 14:54
  • @Arne the purpose of `replace` is to make a copy and that is fine, but the question here is about flagging the original frozen object as used-up and that is very suspicious design, always. – zvone Jul 03 '20 at 17:13
  • @SanchitUttam You are trying to avoid _"sneaky"_ changes done in `some_side_effect`, because you don't trust it? You are not going to get anywhere if you can't trust what functions are doing. Secondly, the interface of the `modify_frozen_data` should not be limited by its trust of some other function. You want to modify data in `modify_frozen_data`, so it should be modifyable. If you really cannot trust `some_side_effect` (but think about that again!), then don't send it your object - send a copy. – zvone Jul 03 '20 at 17:20
  • @zvone For copies, `copy.copy` and `copy.deepcopy` work perfectly fine for dataclasses, and [`replace` is targeted in particular to handling frozen dataclasses](https://github.com/python/cpython/blob/80526f68411a9406a9067095fbf6a0f88047cac5/Lib/dataclasses.py#L1236-L1239) where a copy+update wouldn't work. – Arne Jul 04 '20 at 19:48
  • 1
    @zvone I think there is a misunderstanding here. `some_side_effect` doesn't modify anything, it was just a way to represent some work that doesn't change the data. The idea behind using a frozen dataclass is to prevent modification without being explicit. However, it still doesn't prevent the usage of the "unmodified" data, which can lead to subtle, hard to track bugs. Now, obviously this sort of thing could be handled by being more careful and better code review, but it would be nice to be able to catch instances of this occurring before they cause bugs in production. – Sanchit Uttam Jul 06 '20 at 07:22
  • @SanchitUttam There is no misunderstanding. You can look at this same question from many different angles, but whichever angle you choose, I say what you are trying to do is wrong. You don't have to accept that. It is up to you ;) – zvone Jul 06 '20 at 18:58

1 Answers1

3

I'd agree with Jon that keeping a proper inventory of your data and updating shared instances would be a better way to go about the problem, but if that isn't possible or feasible for some reason (that you should seriously examine if it is really important enough), there is a way to achieve what you described (good mockup, by the way). It will require a little non-trivial code though, and there are some constraints on your dataclass afterwards:

from dataclasses import dataclass, replace, field
from typing import Any, ClassVar


@dataclass(frozen=True)
class AlwaysFreshData:
    #: sentinel that is used to mark stale instances
    STALE: ClassVar = object()

    fresh_data: str
    #: private staleness indicator for this instance
    _freshness: Any = field(default=None, repr=False)

    def __post_init__(self):
        """Updates a donor instance to be stale now."""

        if self._freshness is None:
            # is a fresh instance
            pass
        elif self._freshness is self.STALE:
            # this case probably leads to inconsistent data, maybe raise an error?
            print(f'Warning: Building new {type(self)} instance from stale data - '
                  f'is that really what you want?')
        elif isinstance(self._freshnes, type(self)):
            # is a fresh instance from an older, now stale instance
            object.__setattr__(self._freshness, '_instance_freshness', self.STALE)
        else:
            raise ValueError("Don't mess with private attributes!")
        object.__setattr__(self, '_instance_freshness', self)

    def __getattribute__(self, name):
        if object.__getattribute__(self, '_instance_freshness') is self.STALE:
            raise RuntimeError('Instance went stale!')
        return object.__getattribute__(self, name)

Which will behave like this for your test code:

# basic functionality
>>> original = AlwaysFreshData(fresh_data="fresh")
>>> original.fresh_data
fresh
>>> new = replace(original, fresh_data="even fresher")
>>> new.fresh_data
even_fresher

# if fresher data was used, the old instance is "disabled"
>>> original.fresh_data
Traceback (most recent call last):
  File [...] in __getattribute__
    raise RuntimeError('Instance went stale!')
RuntimeError: Instance went stale!

# defining a new, unrelated instance doesn't mess with existing ones
>>> runner_up = AlwaysFreshData(fresh_data="different freshness")
>>> runner_up.fresh_data
different freshness
>>> new.fresh_data  # still fresh
even_fresher
>>> original.fresh_data  # still stale
Traceback (most recent call last):
  File [...] in __getattribute__
    raise RuntimeError('Instance went stale!')
RuntimeError: Instance went stale!

One important thing to note is that this approach introduces a new field to the dataclass, namely _freshness, which can potentially be set by hand and mess up the whole logic. You can try to catch it in __post_init__, but something like this would be a valid sneaky way to have an old instance stay fresh:

>>> original = AlwaysFreshData(fresh_data="fresh")
# calling replace with _freshness=None is a no-no, but we can't prohibit it
>>> new = replace(original, fresh_data="even fresher", _freshness=None)
>>> original.fresh_data
fresh
>>> new.fresh_data
even_fresher

Additionally, we need a default value for it, which means that any fields declared below it also need a default value (which isn't too bad - just declare those fields above it), including all fields from future children (this is more of a problem, and there is a huge post on how to handle such a scenario).

You also need a sentinel value available whenever you use this kind of pattern. This is not really bad, but it might be a strange concept to some people.

Arne
  • 17,706
  • 5
  • 83
  • 99
  • Thank you very much for the answer! Unfortunately, it doesn't seem to work the way I would expect. In this example: `a = AlwaysFreshData("A1")` `b = AlwaysFreshData("B1")` `print(a) # This triggers an exception, even though 'a' was not replaced` In the code example you've given, created instances interfere with other instances "freshness" – Sanchit Uttam Jul 03 '20 at 11:25
  • haha, that's a pretty big one to miss. I'll see if I can update my post. – Arne Jul 03 '20 at 11:41
  • @SanchitUttam I found a way. It's even a little shorter, but introduces potential side-effects. – Arne Jul 03 '20 at 12:32
  • Thank you! This does indeed answer the question that I asked for. Unfortunately, I cannot use the solution as it is because it does involve introducing a default property which brings up issues when attempting to inherit from the dataclass (as you mentioned in a link on your post). Despite that, I'll mark the question as answered. If you are curious, I managed to work around my problem by actually monkey-patching `dataclasses.replace`. I realise that it isn't a clean solution, but it's cleaner than anything else I've come up with – Sanchit Uttam Jul 03 '20 at 14:22
  • Glad you found a solution, and thanks for the accept! – Arne Jul 03 '20 at 14:32