Keeping multiple dictionaries in sync when one is the super set of the rest

Question

I want to have a class with dictionaries that contain nodes, but nodes can be of several sorts: points, vertices,... (and possible others) I want to efficiently loop and access over either one of these (over all nodes, just over points, just over vertices...).

The class would therefore be:

class A:
   _nodes: dict
   _points: dict
   _vertices: dict

Bu surely, if I modify points, it will not modify nodes and vice-versa.

My first idea was to use data descriptors to keep them in sync, but this didn't really work (or at least I don't know how to implement this). Is it possible?

I can just implement nodes a dict of dicts and implement vertices and points as properties.

class A:
   _nodes = dict{"points": dict(), "vertices": dict()}
   @property
   def _points(self):
        return(self._nodes["points"])
   @property
   def _vertices(self):
        return(self._nodes["vertices"])

But I'm thinking that it might be time consuming to compute the hash and have additional dictionary lookups every time I will access either points or vertices.

And also, ideally, I wanted to have a public cached property view for nodes, vertices and points (and updating these using data desciptors)

class NodeDescriptor():
    def __set__(self, obj, value):
        od = obj.__dict__
        od["_nodes"] = value
        if "nodes" in od: del od["nodes"]

 class PointDescriptor(): ...
 class VertexDescriptor(): ...

class A:
   _nodes = NodeDesciptor()
   _points = PointDesciptor()
   _vertices = VertexDesciptor()

   @cached_property
   def nodes(self): return NodeView(self._nodes)
   @cached_property
   def points(self): return NodeView(self._points)
   @cached_property
   def vertices(self): return NodeView(self._vertices)

But here the idea to keep nodes as a dict of dict does not work, since we cannot call the descriptor whenever we change nodes["vertex"] = new_dict()

What operations do you need? Do you ever modify `_nodes` directly, or only `_points` and `_vertices`, so that `_nodes` could be a "view" of the union of the others (just allowing to *read*)? — Kelly Bundy, May 08 '23 at 11:17
Mostly I need acessing the _nodes information by name, where the type does not really matter. But seldomly I need to access only points or vertices and also know how many of them there are. But since these these operations are not that very often, maybe a filtered view? — Jaka Belec, May 11 '23 at 19:55
You mean _nodes being a dict actually storing the data and the other two being filtering views? I meant the [opposite](https://stackoverflow.com/q/23392976/12671057). — Kelly Bundy, May 11 '23 at 20:02

score 1 · Answer 1 · answered May 08 '23 at 07:36

1

Avoid unnecessary duplication, and implement one set of properties in terms of the other. I might make points and vertices the source of truth and then have nodes just return an iterator that includes both of them:

from dataclasses import dataclass
from itertools import chain
from typing import Iterator

Point = tuple[float, float]
Vertex = tuple[Point, Point]

@dataclass
class A:
    points: dict[str, Point]
    vertices: dict[str, Vertex]

    @property
    def nodes(self) -> Iterator[Point|Vertex]:
        return chain(self.points.values(), self.vertices.values())

answered May 08 '23 at 07:36

Samwise

68,105
3
30
44

But often I do not care or do not know what the node type is and would like to modify/access/delete it, so I I cannot access it by `obj.nodes["a"] = new_vertex` or `del obj.nodes["a"]`. I would need to create a method that would first check in what dict the node is or loop over these if there would be more types. – Jaka Belec May 08 '23 at 07:42
Sorry, I do not care about deletion or modification, but I would like to access a node by name, not knowing what type it is, `v = obj.node["a"]`. – Jaka Belec May 08 '23 at 07:48
You might want to think more carefully about your data model -- what if a point and vertex have the same name? – Samwise May 08 '23 at 14:53
I've been thinking about it for a month now, that is why I am asking here :) I could just have '_nodes', and have filters (properties) for accessing '_vertices and '_points', but this would not provide be the most efficient looping through vertices adn points. – Jaka Belec May 08 '23 at 16:50
I don't think you understood my question -- it's not a matter of how you implement the interface, but the interface itself. Are you *sure* that you want all these things in the same namespace? What are the situations in which a point and vertex might have the same name? What are the situations in which you might want to access a node by name and not care what its type is? What would you want the behavior to be when those situations intersect? – Samwise May 08 '23 at 16:58
Indeed, that is the problem I have. I don't want they to have the same name. At the moment I decided to store all into one dict (so access is easy and no duplicates), although I like the solution of Samwise also. I need to learn some more Python to understad jsbueno's solution. – Jaka Belec May 11 '23 at 19:52

jsbueno · Answer 2 · 2023-05-08T17:35:07.313

I think a custom dictionary that would keep the meta-information on a node type alongside the value could fit you.

That will allow for O(1) retrieval and updating - and O(m + n) (m, n being the amount of Nodes of either type) for any action that would need iteration over the types - but the code would be much simpler.

It is possible to inherit from UserDict, and then an extra custom class that will bind the node types for retrieving and set data in the dictionary:

from collections import UserDict
from collections.abc import MutableMapping
from weakref import WeakKeyDictionary, proxy


class NodeType:
    def __init__(self, type_):
        self.type = type_
    
    def __set_name__(self, owner, name):
        self.name = name
        
    def __get__(self, instance, owner):
        if instance is None:
            return self
        return ProxyDict(instance, self.type)


class ProxyDict(MutableMapping):
    instance_cache = WeakKeyDictionary()
    def __new__(cls, instance, type_):
        container = cls.instance_cache.setdefault(instance, {})
        if type_ in container:
            return container[type_]
        self = super().__new__(cls)
        self.instance = proxy(instance)
        self.type = type_
        container[type_] = self
        return self

    def __init__(self, *args):
        # just implemented to prevent an error at object.__init__, as
        # initalization is done in __new__
        pass
    
    def __getitem__(self, key):
        return self.instance.__getitem__(key, self.type)
    def __setitem__(self, key, value):
        return self.instance.__setitem__(key, value)
    def __delitem__(self, key):
        return self.instance.__delitem__(key, self.type)
    def __iter__(self):
        yield from (key for key, (t, v) in self.instance._items() if t==self.type)
    def __len__(self):
        # Ok  - len will be slow. 
        return sum(1 for _ in iter(self))
    def __repr__(self):
        return repr(dict(self))
    
class A(UserDict):
    points = NodeType("Point")
    vertices = NodeType("Vertex")
    _counter_id = 0
    
    def __init__(self, *args, **kwargs):
        self._id = self.__class__._counter_id
        self.__class__._counter_id += 1
        super().__init__(*args, **kwargs)
    
    def __getitem__(self, key, type_=None):
        it_type, item = super().__getitem__(key)
        if type_ in (it_type, None):
            return item
        raise KeyError(key)
    
    def __setitem__(self, key, value, type_=None):
        if type_ is None:
            # if the class name is "Vertex
            type_ = type(value).__name__
        super().__setitem__(key, (type_, value))
 
    def __delitem__(self, key, type_=None):
        it_type, item = super.__getitem__(key)
        if type_ in (it_type, None):
            super().__delitem__(key)
        raise KeyError(key)
    def _items(self):
        # same as .items(), but also return the type of each value
        for key in self:
            yield (key, super().__getitem__(key))
    def __hash__(self):
        return self._id

I think from there you can fine tune what you want for __repr__ in each case, and what to do on name clashes, and such. I made a somewhat complex cache system with weakrefs, but it might not even be necessary, with a new proxy-dict being instantiated at each access to a.vertices, etc... it would not be bad: it would not be much worse than what Python does for method access in an instance.

Here is the code above in use in the interactive mode:


In [137]: class Vertex: pass

In [138]: class Point: pass

In [139]: a = A()

In [140]: a["bla"] = Vertex()

In [141]: a["ble"] = Point()

In [142]: a.vertices["bla"]
Out[142]: <__main__.Vertex at 0x7f6dc6dbbb90>

In [143]: a.vertices["ble"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
[...]
KeyError: 'ble'

In [144]: a.points["ble"]
Out[144]: <__main__.Point at 0x7f6dcc91a110>

Keeping multiple dictionaries in sync when one is the super set of the rest

2 Answers2