2

Context

I am developing a 2D animation system where the data is organized as a node-based tree structure with semantics and functionality very similar to XML DOM -- but not quite similar enough to just use an existing XML implementation. For the sake of clarity, let's simplify and assume that the data structure is conceptually defined via the following pseudo-code:

class Node {
    Node* firstChild;
    Node* nextSibling;
    Node* parent;
}

Following state-of-the-art guidelines of modern C++ (Core Guidelines, Herb Sutter talk, etc.), an appropriate choice would be to use smart pointers for owning handles, and raw pointers for non-owning handles. Since ownership is unique, then std::unique_ptr makes sense:

class Node {
    std::unique_ptr<Node> firstChild_;
    std::unique_ptr<Node> nextSibling_;
    Node* parent_;

public:
    Node() : parent_(nullptr) {}
    static std::unique_ptr<Node> make() { return std::make_unique<Node>(); }

    Node* firstChild() const { return firstChild_.get(); }
    Node* nextSibling() const { return nextSibling_.get(); }
    Node* parent() const { return parent_; }

    Node* appendChild(std::unique_ptr<Node> child) { ... }
    std::unique_ptr<Node> removeChild(Node* child) { ... }

    Node* makeChild() { return appendChild(make()); }
}

This API assumes that client code knows what it is doing. Like when using a list::iterator, clients need to read the doc to know which operations may invalidate nodes, and be careful not to blindly hold a Node* as a data member and dereference it later, since it may be dangling. I am on board with this approach for C++ clients: I believe this is idiomatic C++ and is the price you pay for performance.

However, I absolutely need to provide more safety for Python clients. The animation software is a C++ GUI program with an embedded Python console, which means that many Python clients are expected to be artists with very little programming experience, mostly copy-pasting code from tutorials and adapting it to fit their needs. To make things worse, such unreliable non-tested Python code might be executed while the user is in a long-lasting interactive session with potentially valuable unsaved data. The software should absolutely not crash when a Python client tries to use an expired node. Expected behavior is something along the lines of:

[ Embedded Python Console ]
>>> node = getSelectedNode() # allocated from C++ and selected in the GUI
>>> print(node)
<Node at 0x14e4c30 with 42 child>
>>> node.parent
<Node at 0x14e4d24 with 12 child>
>>> deleteSelection() # or via GUI interaction
>>> print(node)
<Invalid node: the node has already been deleted>
>>> print(node.parent)
InvalidNodeError: the node has already been deleted
>>>

Questions

How would you wrap this C++ API using pybind11 in order to get the expected behavior?

If necessary, how would you change the C++ API to allow for such behavior, and/or make the wrapper code more readable, idiomatic, etc.?

What I've already tried or considered

Naively wrapping as class_<Node>(m, "Node") does not work: the Python instance cannot know whether the node is still valid or not. Basically, returning Node* via take_ownership results in double deletion, and returning them either via reference or reference_internal results in undefined behavior (read, segmentation faults).

So we either need some PyNode trampoline class tracking the validity of nodes via other means (e.g., register for callbacks such as Node::onAboutToDie()), or changing the C++ ownership model to use ref-counted smart pointers.

What I believed was a good option was to change the C++ code to use shared_ptr instead of unique_ptr. I would have Node derive from enable_shared_from_this, and wrap it as class_<Node, PyWeakPtr<Node>>(m, "Node") where PyWeakPtr is a custom holder storing a weak_ptr under the hood. Since the nodes derive from enable_shared_from_this, the holder could have a PyWeakPtr(T*) constructor converting the raw pointer into a weak_ptr. In a nutshell, this would leverage the ref-counting ability of shared pointers not to have shared ownership (ownership is still unique), but just to be able to use weak references on Python side, which would check expired() before every access, throwing a catchable exception if expired. However, so far, my various attempts failed: either I couldn't manage to get the expected behavior, or I couldn't even get them to compile, etc.

A more obvious solution also based on shared_ptr is to use class_<Node, std::shared_ptr<Node>>(m, "Node") instead of our custom holder PyWeakPtr, but it does not work either because it would artificially extend the lifetime of nodes as long as some python variable points to them. I consider this a leak: nodes are not supposed to be shared, and if the user deletes a node in the UI, it should be gone and the destructor immediately called. The semantics really should be that accessing a "semantically-deleted" node in Python raises a Python error (without crashing the C++ program), not silently masquerading as a valid node.

If useful, I might take the time to clean/make_minimal/etc some of these attempts, but for now I'll leave the question as is for conciseness, Maybe some expert out here already have some useful insight, such as whether the whole approach even makes sense, etc. :)

Note that many existing software solving similar problems such as Qt's QDomDocument or Pixar's Universal Scene Description tend to use ref-counted weak references (more or less hidden internally) even in the C++ API, totally shielding from mistake C++ and Python clients alike. I'm open to this approach as well, although ideally, I think I'd prefer to stick with the general guideline of simply using non-owning raw pointers in C++.

Boris Dalstein
  • 7,015
  • 4
  • 30
  • 59
  • Did you made progress on this issue ? I am not sure that smart pointers are the best solution: one potential problem is that if you have a deep tree that goes out of scope from its root you may have a stack overflow due to "recursive" deletion of the smart pointers. This stack overflow risk could be mitigated with a custom Node destructor but that looks like a code smell. An alternative could be to have a Graph object that owns the nodes and uses keys (not pointers) that could be invalidated when a node is deleted. – Gabriel Devillers Aug 05 '22 at 07:51
  • [Somewhat related question](https://stackoverflow.com/questions/69851487/swig-simple-idiomatic-wrapper-usage-when-weak-ptr-are-used) I asked a while ago. – Gabriel Devillers Aug 05 '22 at 07:56
  • 1
    @GabrielDevillers Yes, I made progress: see the answer that I just added, which details the solution I adopted. – Boris Dalstein Aug 05 '22 at 14:48

1 Answers1

1

For anyone with a similar problem, I ended up implementing a custom solution using intrusive reference counting.

The data structure basically looks like this:

class Node {
    Node();

public:
    static NodePtr createRoot();

    Node* createChild();
    NodePtr removeChild(Node*);
    void insertChild(Node* child, Node* nextSibling);

    void destroy();

private:
    friend class NodePtr;
    std::atomic<size_t> refCount_;
    std::atomic<bool> isAlive_;

    Node* firstChild_;
    Node* lastChild_;
    Node* previousSibling_;
    Node* nextSibling_;
    Node* parent_;
};

class NodePtr {
public:
    NodePtr(Node*);
    NodePtr(NodePtr&);
    NodePtr(NodePtr&&);
    NodePtr& operator=(NodePtr&);
    NodePtr& operator=(NodePtr&&);
    ~NodePtr();

    Node* get() const;
    Node* operator->() const;
    Node& operator*() const;

private:
    Node* p_;
};

C++ client code can either manipulate raw pointers (non-owning), or use the smart-pointer NodePtr. Python code always manipulate them via a NodePtr. Whenever trying to dereference a NodePtr whose pointed-to node has isAlive_ == false, then a NotAlive exception is raised (translated to a Python exception if within a Python interpreter).

The owning pointer NodePtr increments/decrements the refcount of the pointed-to node, but also of all its ancestors. This means that a NodePtr to any node in the tree actually keeps alive the root of the tree. In other words, the data member refCount_ actually counts the number of NodePtr to this node or any of its descendants.

If a node is explicitly destroyed via destroy(), then the data member isAlive_ is set to false, but the memory for the object is not de-allocated right away. De-allocation only takes place when the refcount goes to zero. In a sense, this means that a NodePtr acts like an std::shared_ptr to the root of the tree, but like an std::weak_ptr to the pointed-to node itself.

This model comes with its own set of tradeoffs: for example, the necessity to increment/decrement the refcount of all ancestors of a node whenever a NodePtr is created/deleted, which can be costly in case of deep trees. An alternative can be to not perform the recursive ref counting, that is, refCount_ would only count the number of NodePtr to this node itself. However, this means that a NodePtr does not keep alive the root, and if such root was created from Python, it is important to keep a Python reference to the root otherwise the whole tree is destroyed when losing the Python reference to the root, even if there exist Python references to some of the children.

Boris Dalstein
  • 7,015
  • 4
  • 30
  • 59