How to implement an intrusive linked list that avoids undefined behavior?

Question

For the third time in a few years I find myself needing an intrusive linked list for a project that does not allow boost (ask management...).

For the third time I find that the intrusive linked list implementation I have works perfectly, but that I really don't like that it makes use of undefined behavior - namely when converting the pointer to a list node into a pointer to the object containing that list node.

That awful code currently looks like this:

struct IntrusiveListNode {
    IntrusiveListNode * next_;
    IntrusiveListNode * prev_;
};

template <typename T, IntrusiveListNode T::*member>
class IntrusiveList {
// snip ...
private:
    T & nodeToItem_(IntrusiveListNode & node) {
        return *(T*)(((char*)&node)-((size_t)&(((T*)nullptr)->*member)));
    }

    IntrusiveListNode root_;
};

I don't really care how ugly nodeToItem_ gets, but I would like to keep the public interface and syntax of IntrusiveList the same. Specifically, I would like to specify the type of a list type using IntrusiveList<Test, &Test::node_> rather than IntrusiveList<Test, offsetof(Test, node_)>.

It's almost 2016 - is there any way to do this without invoking undefined behavior?

Edit: There have been a few suggested solutions (involving different structures of the list) in the comments which I want to summarize here:

Live with undefined behavior, since the language has seemingly arbitrary limitations that prevent using member pointers in reverse.
Store an additional pointer to the containing class within IntrusiveListNode. This is currently probably the cleanest solution (no change in interface necessary), but does require a third pointer in every list node (small optimizations may be possible).
Derive from IntrusiveListNode and use static_cast. In boost this is the base_hook version of an intrusive linked list. I would like to stick with the member_hook version to avoid introducing multiple inheritance.
Store pointers to the next and previous containing class instead of to the next and previous list node within IntrusiveListNode. This makes creating the root node within the intrusive list difficult. Either the list must include a full instantiation of T (which is not possible e.g. if T is abstract), or the end of the list would need to be a null pointer (which would break --list.end(), allowing forward iteration only).
Boost intrusive lists have a member_hook version that works somehow, but the implementation has not been understood (and it possibly also relies on undefined behavior).

The question remains: is it possible to make an intrusive member-based list with bidirectional iteration support, no undefined behavior, and no "unnecessary" memory overhead?

Coudnt parse your return statement. Check if https://github.com/arun11299/Cpp-Intrusive-list/blob/master/intrusive_list.h helps. I implemented it long time back. Not sure at what point I left it, but basic things should work. — Arunmu, Dec 07 '15 at 13:52
@ddriver No dynamic allocations at run time is the biggest drive behind using intrusive linked lists. Plus the fact that objects will be in several lists at the same time, though that could of course be solved using lists of pointers. — zennehoy, Dec 07 '15 at 13:56
@Arunmu Your implementation requires that items to be placed in the list must be derived from `ListNode`. This only allows items to be added to a single list. I need the node as member variant of an intrusive linked list. — zennehoy, Dec 07 '15 at 13:59
I guess I'm not seeing it but...: can't you use a `T*`s in your list node information and avoid the need to determine the outer node based on member? As far as I know there is no way in the C++ standard to obtain a containing object based on a member (things like `offsetof()` certainly don't work with non-standard layout types). — Dietmar Kühl, Dec 07 '15 at 14:02
@DietmarKühl Yes, that would be possible, and I have considered it, but it does increase the size of every list node by another pointer. Being able to use `offsetof` would definitely help, but unfortunately `offsetof` does not work with member pointers (only with the name of the member). — zennehoy, Dec 07 '15 at 14:06
@zennehoy: why would it increase the size of every list node by another pointer? The member pointer needed would, obviously, still be a template parameter. You'll need the link to the next node (and if the list isn't singly linked possibly also to a previous node) and I don't see what other pointer would be needed all of a sudden... — Dietmar Kühl, Dec 07 '15 at 14:10
@zennehoy: just had a look at what I did: the adjacent nodes wouldn't be directly accessible from each node - you'd use a suitable list or iterator object to traverse a specific list. Since your stated goals are no dynamic allocations and having objects in multiple lists that's still sufficient. — Dietmar Kühl, Dec 07 '15 at 14:16
@DietmarKühl Sorry, I misunderstood, I thought you meant a `T*` in addition to the next and previous pointers. The problem with using `T*`s is that you must hold an entire `T` as the root node of the list. This is not possible if `T` cannot be default constructed or is an abstract base class, and is wasteful if `T` is large. I have the abstract base class case, so I can't instantiate `T` as the root node. — zennehoy, Dec 07 '15 at 14:22
@zennehoy: You need to hold an entirey `T*` as the start of the list. I doubt you can do better than that to start with. The real constraint is that you can't navigate to another object without a context stating which member to use (and even that shouldn't be much of an issue, actually). — Dietmar Kühl, Dec 07 '15 at 14:25
@DietmarKühl How would you implement iterators (specifically, what would you use for the end node) if the list only contains a `T*`? What would `list.end()` return? Given an empty list, what would `list.begin()` return? Given a list with a single item, what would `++list.begin()` return? — zennehoy, Dec 07 '15 at 14:29
Since you have mentioned that you cannot use boost, I wonder if boost intrusive implementation do what you want to do ? If yes, better to take hint out of it. — Arunmu, Dec 07 '15 at 14:31
@ddriver Not sure what you mean? Offset for T based on what reference? I thought your comment asked in general why an intrusive linked list, rather than say std::list. — zennehoy, Dec 07 '15 at 14:34
@Arunmu Yes, though I suspect boost also uses the same undefined behavior that my implementation does (i.e., dereferencing a null pointer during pointer arithmetic to get a pointer to the containing object). If not, I'd like to know what boost does differently. — zennehoy, Dec 07 '15 at 14:36
The solution would to add a pointer in the `IntrusiveListNode` item to the item. What you do can hardly be done without UB. Intrusive linked list usually do that. — Guillaume Racicot, Dec 07 '15 at 14:50
@Gernot1976 Yes, that works due to the way VS 2015 implements `offsetof` as a macro. G++ uses a builtin, which doesn't work with member pointers unfortunately. — zennehoy, Dec 07 '15 at 15:23
Am I to assume that you cannot change the original class(es) whose objects you want to put in the list, I think class `T`? Are there objects of different types in the same list? — Peter - Reinstate Monica, Dec 07 '15 at 15:31
@Gernot1976 - there are a lot of things that happen to work, unfortunately, the standard does not guarantee they will work. So that's still UB + lucky implementation. The standard says `offsetof` with a non POD is UB. — dtech, Dec 07 '15 at 15:32
@PeterA.Schneider I certainly can change the classes containing the list nodes, otherwise I couldn't use an intrusive list. Ideally I don't want to put any part of the list implementation into `T` though (other than the list-node member(s)). All objects in the list are either `T`s or derived from `T`. `T` is generally an abstract class (i.e., has pure virtual methods). — zennehoy, Dec 07 '15 at 15:37
@zennehoy: you could use bog-standard null-pointers and a pointer to the list in an iterator (the latter is only needed when you need to work in both directions). I think you can also pull the usual "point to the pointer of the next node"-trick to avoid any special handling. — Dietmar Kühl, Dec 07 '15 at 15:41
How important was it that the linkage be a member and the template parameter identifying it by a member pointer? I understand the requirement that T contain a fixed number, often more than one, of the same linkage template. But why can't they all be base classes? Have a template parameter such as `int` to just make them unique and another for T, then you can static cast a linkage `&` to a `T&` — JSF, Dec 07 '15 at 15:42
I'm missing something (I'm sure). What keeps you from properly deriving the base T from IntrusiveListNode and dynamic_casting to it when you need the object back? (Or, possibly, have a proper abstract access function for the node in the base T which returns null if the member is not present in a derived class) -- in any case, why not using standard mechanisms? — Peter - Reinstate Monica, Dec 07 '15 at 15:42
@PeterA.Schneider `dynamic_cast` in this situation would be far more expensive than makes sense in the middle of obviously performance motivated tricky coding. `static_cast` should be good and I often choose to derive from the linkage object rather than have it be a member (to avoid these issues). The superficial problem with deriving from it is the desire to have more than one of them per T, but that isn't a fundamental problem. — JSF, Dec 07 '15 at 15:46
@zennehoy - maybe you could reduce the overhead of keeping a `T*` from one per node to one for all nodes, by grouping all nodes per object in a struct and having the `T*` be its member? Unlike `T`, the nodes struct will be a POD so you can get its pointer with `offsetof` — dtech, Dec 07 '15 at 15:47
http://www.boost.org/doc/libs/1_38_0/doc/html/intrusive/usage.html . This explains how boost does it using hooks. — Arunmu, Dec 07 '15 at 15:51
@JSF: I think mostly it's a general allergy against multiple inheritance, so I haven't really thought it through. The problem is that multiple inheritance would spread throughout the code (to any class that has multiple nodes or a base class and a node), whereas the current solution would be one single line of code that could be clearly commented with the intent and reasoning. I'll definitely give the multiple inheritance solution some more thought though. — zennehoy, Dec 07 '15 at 15:52
@Arunmu That's what I originally based my invasive list interface on a few years ago (`member_hook` being effectively the boost equivalent of `IntrusiveListNode`). I've tried wading through the boost code to figure out where pointers to the containing object come from, but at some point it got too deep for me. Assuming boost pulls this off without undefined behavior, it would be great to know how! — zennehoy, Dec 07 '15 at 16:00
@JSF Yes, that is one of the methods possible using boost. The other is a bit further down, using member hooks, which is what I would prefer to use. Unfortunately what I haven't been able to figure out from the boost implementation is how the containing object is retrieved given a list of such member hooks. — zennehoy, Dec 07 '15 at 16:11
Sorry I failed to delete my wrong comment fast enough. What was obvious from the start of that documentation turned out to be incorrect when I commented first and then decided to read more. After I deleted my wrong comment, your correct reply to it appeared, confusing whoever reads that afterward. — JSF, Dec 07 '15 at 16:13
@DietmarKühl Hopefully I'm just missing something obvious, and there is a nice solution along those lines, but how would `--list.end()` work? — zennehoy, Dec 07 '15 at 16:35
@zennehoy: I think there _is_ a nice solution (although I have only an implementation for a forward list but I think I know how to make it work with a bidirectional list; I hope to answer with a full solution later): the link would be asymmetric and contain a pointer to the next node and a pointer to the previous link. An iterator would point at the link whose `next` pointer indicates the current node — Dietmar Kühl, Dec 07 '15 at 16:40
Is it really necessary that your question took me 15min to understand with reading back and forth btw comments what the actual problem is? The question and problem itself would be really nice if the question allowed to easily grasp what's actually going on and what's actually undefined in your opinion. — Martin Ba, Dec 07 '15 at 16:47
@MartinBa Yes, there has been quite an exchange of ideas in the comments. I summarized the comment responses in the question. — zennehoy, Dec 07 '15 at 17:01
Boost uses the same thing you're already using. See [parent_from_member.hpp](http://www.boost.org/doc/libs/1_58_0/boost/intrusive/detail/parent_from_member.hpp), the `offset_from_pointer_to_member` is a custom compiler-dependent `offsetof` like what you already have. It has UB. — , Dec 07 '15 at 18:13
If you are already using templates and classes on the list class, why dont you use them for the node class, that will change the way you allocate your nodes. — umlcat, Dec 07 '15 at 18:13

score 9 · Accepted Answer · answered Dec 07 '15 at 20:37

I would side-step the problem and use a node<T> containing suitable members to link the range. To deal with a bidirectional, intrusive list I'd use an asymmetric node<T> like this:

template <typename T>
class intrusive::node
{
    template <typename S, node<S> S::*> friend class intrusive::list;
    template <typename S, node<S> S::*> friend class intrusive::iterator;

    T*       next;
    node<T>* prev;
public:
    node(): next(), prev() {}
    node(node const&) {}
    void operator=(node const&) {}
};

The basic idea is that the list<T, L> contains a node<T> using the next pointer to point to the first element. That's fairly straight forward: given a pointer p to a T the link to the next node can be traversed using (p->*L).next. However, instead of directly navigating the list using T*, an iterator<T, L> actually uses a pointer to the node<T>: while that isn't necessary for forward traversal, it enables backward traversal and insertions anywhere in the list without special treatment of the list head.

The copy constructor and copy assignment are defined to do nothing to avoid half-inserted nodes when copying a node. Depending on the needs of the nodes it may be more reasonable to rather = delete these operations. However, that's unrelated to the question at hand.

The iterator just uses a pointer to the node<T> whose next member points at the current node. For the first element in the list this is a pointer to the list<T, L>'s node<T> member. Assuming you got a pointer to a suitable node<T> an iterator<T, L> can be created from that:

template <typename T, intrusive::node<T> T::*Link>
class intrusive::iterator
{
    template <typename S, node<S> S::*> friend class intrusive::list;
    node<T>* current;

public:
    explicit iterator(node<T>* current): current(current) {}
    T& operator*() { return *this->operator->(); }
    T* operator->() { return this->current->next; }
    bool operator== (iterator const& other) const {
        return this->current == other.current;
    }
    bool operator!= (iterator const& other) const {
        return !(*this == other);
    }
    iterator& operator++() {
        this->current = &(this->current->next->*Link);
        return *this;
    }
    iterator operator++(int) {
        iterator rc(*this);
        this->operator++();
        return rc;
    }
    iterator& operator--() {
        this->current = this->current->prev;
        return *this;
    }
    iterator operator--(int) {
        iterator rc(*this);
        this->operator--();
        return rc;
    }
};

Dereferencing just uses the next pointer. The same is true for forward iteration which uses the next pointer together with the member pointer to get hold of the address of the next node<T>. Since the iterator's prev already pointesr at a node<T> backward iteration just needs to replace the current node<T> with the prev element.

Finally, this leaves a list maintaining the beginning and the end of the list. Dealing with the bidirectional access and the corresponding access to the last node adds a bit of complexity and the need to actually have a dedicated node. Here is an implementation (which isn't thoroughly tested: I may have messed up some of the links):

template <typename T, intrusive::node<T> T::*Link>
class intrusive::list
{
    node<T> content;

public:
    list() { this->content.prev = &this->content; }
    iterator<T, Link> begin() { return iterator<T, Link>(&this->content); }
    iterator<T, Link> end() { return iterator<T, Link>(this->content.prev); }

    T& front() { return *this->content.next; }
    T& back() { return *(this->content.prev->prev->next); }
    bool empty() const { return &this->content == this->content.prev; }
    void push_back(T& node) { this->insert(this->end(), node); }
    void push_front(T& node) { this->insert(this->begin(), node); }
    void insert(iterator<T, Link> pos, T& node) {
        (node.*Link).next = pos.current->next;
        ((node.*Link).next
         ? (pos.current->next->*Link).prev 
         : this->content.prev) = &(node.*Link);
        (node.*Link).prev = pos.current;
        pos.current->next = &node;
    }
    iterator<T, Link> erase(iterator<T, Link> it) {
        it.current->next = (it.current->next->*Link).next;
        (it.current->next
         ? (it.current->next->*Link).prev
         : this->content.prev) = it.current;
        return iterator<T, Link>(&(it.current->next->*Link));
    }
};

Just for a bit of sanity: here is a function to simply print the list:

template <typename T, intrusive::node<T> T::*Link>
std::ostream& intrusive::operator<< (std::ostream& out, intrusive::list<T, Link>& list)
{
    out << "[";
    if (!list.empty()) {
        std::copy(list.begin(), --list.end(), std::ostream_iterator<T>(out, ", "));
        out << list.back();
    }
    return out << "]";
}

There are few other approaches avoiding the need to do any funky access to the enclosing class. The above avoids a couple of conditions. Assuming I managed to set the appropriate links correct the code would not rely on any implementation defined or undefined behavior.

You'd use the list like this:

class Node {
public:
    intrusive::node<Node> link0;
    intrusive::node<Node> link1;
    int                   n;
    Node(int n): n(n) {}
};
std::ostream& operator<< (std::ostream& out, Node const& n) {
    return out << n.n;
}

int main()
{
    intrusive::list<Node, &Node::link0> l0;
    intrusive::list<Node, &Node::link1> l1;

    Node n[] = { 10, 11, 12, 13, 14, 15 };

    l0.push_front(n[0]);
    l0.push_front(n[1]);
    l0.push_front(n[2]);

    l1.push_back(n[0]);
    l1.push_back(n[1]);
    l1.push_back(n[2]);

    std::cout << "l0=" << l0 << " l1=" << l1 << "\n";
}

Brilliant, this is exactly the answer I was hoping for! I noticed a small bug in the `erase` method, which currently returns an iterator two after the erased node, rather than directly after the erased node (`current` of the returned iterator should point to the node prior to the node that was erased, such that `current->next` will return the node after the one erased). A simple `return it;` does the trick. — zennehoy, Dec 08 '15 at 10:04

score 4 · Answer 2 · answered Dec 07 '15 at 17:56

4

The question remains: is it possible to make an intrusive member-based list with bidirectional iteration support, no undefined behavior, and no "unnecessary" memory overhead?

What you are trying to do is take a non-static data member of a C++ object, and convert it to a pointer to its containing class. In order to do that, you have to do some operation of the form:

node_ptr *ptr = ...;
auto p = reinterpret_cast<char*>(ptr) + offset;
T *t = reinterpret_cast<T*>(p);

To make this operation legal C++, you need all of the following to be well-defined:

Getting an byte offset from the particular NSDM for the node to the T that contains it.
Applying that offset to a pointer-to-a-member will result in a pointer value that is legal to cast to its owning type T.

Item 1 is only possible in well-defined C++ via offsetof; the standard provides no other way to compute that offset. And offsetof requires the type (in this case T) to be standard layout.

Of course, offsetof requires the name of the member as a parameter. And you can't pass parameter names through template arguments and the like; you have to do it through a macro. Unless you're willing to force the user to name the member in a specific way.

So there are your restrictions: T must be standard layout, and you have to either use a macro instead of a direct function call, or you must force the user to use a specific name for the member. If you do this, you should be safe, according to C++.

Here's what the code would look like:

struct intrusive_list_node
{
  intrusive_list_node *next;
  intrusive_list_node *prev;

  template<typename T, size_t offset> T *convert()
  {
    auto p = reinterpret_cast<char*>(this); //Legal conversion, preserves address.
    p -= offset; //Legal offset, so long as `offset` is correct
    return reinterpret_cast<T*>(p); //`p` has the same value representation as `T*` did originally, so should be legal.
  }
}

#define CONVERT_FROM_MEMBER(node, T, member_name) node->convert<T, offsetof(T, member_name)>()

answered Dec 07 '15 at 17:56

Nicol Bolas

449,505
63
781
982

2

Kind of offtopic, but I wonder what exactly is the argumentation for not supporting `offsetof` for all types. I mean the compiler obviously knows the exact object binary layout. – dtech Dec 07 '15 at 18:26
Boost currently does what the OP is doing for GNU, Intel etc. For MSVC it folllows the ABI to get the correct offset. Function to look 'offset_from_pointer_to_member'. – Arunmu Dec 07 '15 at 18:48
Maybe the need for standard ABI is the real answer to this question :) – Arunmu Dec 07 '15 at 18:49
@Arunmu - that would be great but is not likely to happen, for some reason people are very committed to supporting exotic platforms with odd rules that have long been extinct. I myself am currently implementing a programming language, and happily "limited" it to only supporting 99.99999999999% of all operational devices in existence :) – dtech Dec 07 '15 at 18:53
@ddriver: Thats really cool. But the other disappointment for me after going through boost code is knowing how much I do not know C++. Time to learn :) – Arunmu Dec 07 '15 at 19:05
2

@ddriver: The reason that `offsetof` is not allowed on "all types" is the same reason a pointer-to-data-member in the general case is much more than an offset: **virtual inheritance is evil and breaks many things**. To be exact, when virtual inheritance is involved, the compiler does NOT know the exact object layout, and must compute it at runtime with the help of virtual-base-subobject pointers. I have a feeling that "standard-layout" is unnecessarily restrictive, but there doesn't seem to be a term for "types not involving virtual inheritance". – Ben Voigt Dec 07 '15 at 19:34
"when virtual inheritance is involved, the compiler does NOT know the exact object layout, and must compute it at runtime" - isn't that just the adjustment of the pointer when up/down casting? The object layout is known, which is how the compiler knows how to adjust the object pointer if needed, as far as I know that is the only thing that happens during the runtime. The layout is only unknown in the context of a polymorphic reference, but in the case of the particular type it is know. There is certainly nothing making it impossible for `offsetof` to work, it does work in some implementations. – dtech Dec 07 '15 at 20:04
Pardon me if I am wrong, I am certainly not an expert, and don't think of it as nitpicking, and call me old-fashioned, but I just can't settle for the "what", I need the "why". What prevents the compile from being able to calculate the layout during compile time? Virtual inheritance essentially puts duplicates in one spot producing somewhat "funky" layout, compared to the standard, which demands some runtime lookups and adjustments, but it does so following set rules, I certainly don't see any unknowns when it comes to laying the object out. – dtech Dec 07 '15 at 20:29
@ddriver: `offsetof` is a macro, which means it's done at compile-time. Once you get virtual inheritance involved, the offset becomes a *runtime* computation. Obviously, you could have some kind of operation that could compute this offset at runtime, but that wouldn't be `offsetof`. – Nicol Bolas Dec 07 '15 at 20:47
@NicolBolas - whatever mess you create using virtual inheritance, the compiler will parse it, analyze it and lay it out. Virtual inheritance on its own doesn't happen during the runtime, it is a compile time construct. The calculations you talk about involve looking up offset values based on the runtime type information and adjusting pointers, those values that are being looked up themselves are calculated from the object layout during compile time - thus the layout is known for any particular type, therefore it should be possible to use `offsetof`. Where is the unknown? – dtech Dec 07 '15 at 23:49
Could you possibly elaborate what do you mean by "the compiler does NOT know the exact object layout, and must compute it at runtime"? Also answer the question I asked above - "What prevents the compile from being able to calculate the layout during compile time?". – dtech Dec 07 '15 at 23:52
@ddriver: Think of it like calling a virtual function. Given a pointer to a base class, the compiler *cannot know* which virtual function will be called. So it has to fetch a vtable pointer and get a function pointer out of that. Virtual inheritance is like that. Given a pointer to a class that virtually inherits from another, the compiler cannot know exactly where the virtual base classes are. Virtual inheritance works by giving the class a pointer to the actual location of the base class. I cannot explain this in any detail in a comment; Google how virtual inheritance is implemented. – Nicol Bolas Dec 08 '15 at 00:04
@NicolBolas - that's just the thing, I think I know how it works, I've implemented in myself for the language I am working on and it works, which is why what you say seems very odd and confusing to me. "Given a pointer to a class that virtually inherits from another, the compiler cannot know..." - that's exactly what I've been saying all along, the layout is not known if you use a polymorphic reference, but the layout of the particular actual type is known at compile time, so if it is known then `offsetof` should work, as it doesn't take in a vague pointer or reference, but the actual type. – dtech Dec 08 '15 at 00:12
@ddriver: OK, let's pretend for a moment that you're right. You can indeed statically get an offset of a virtually inherited member from just the type. OK so... what good is that to you? The purpose of the offset is to be able to apply it to a pointer to that type. Well, if you have a pointer, you do not *know* that it is a pointer to that type; it may be a pointer to a type *derived* from it. If the dynamic type of the object is not the type you got the offset for, then you can't do anything with that offset. – Nicol Bolas Dec 08 '15 at 00:17
@ddriver: Really, just [read this answer](http://stackoverflow.com/a/1130760/734069). – Nicol Bolas Dec 08 '15 at 00:19
I mean that the only reason the layout is unknown is because the actual type behind that pointer is not known. So what said that began this confusion - "compiler does NOT know the exact object layout, and must compute it at runtime" is technically wrong, the layout is not computed during the runtime, what happens during the runtime is establishing which is the actual type in order to know which already existing layout to use for it. – dtech Dec 08 '15 at 00:19
My question was strictly about `offsetof()` - since it is intended to use a concrete type as a parameter, what's preventing it from working with a non-standard layout object? That's what the documentation says - it is used to determine the offset of a member in a given type, it doesn't even say anything about polymorphic usage, so I am puzzled why your whole thesis revolves around something practically unrelated. There is no reason why it shouldn't be able to work in the context it was intended to be used. – dtech Dec 08 '15 at 00:25
Never mind that, turns out virtual inheritance and object layout have nothing to do with the limitations of `offsetof` - it is due to its clumsy implementation and inherent inability to fully inspect types, so even when the layout is known it does it no good, since it doesn't have access to that information. Something as simple as a private member breaks it, no need for anything nearly as fancy as virtual inheritance and funky object layouts. It also explains why I had no problem to implement it - since in my case it has full access to the type meta data. – dtech Dec 08 '15 at 00:36

Fabian Knorr · Answer 3 · 2015-12-07T18:12:04.787

If you don't mind changing the IntrusiveListNode type, you can have a node contain a handle pointing to the previous / next node - you'll only have to do the node -> handle lookup, not the reverse.

template<typename Node>
struct IntrusiveListHandle {
    Node *next = nullptr;
    // and Node* prev, etc ...
};

template<typename Node, IntrusiveListHandle<Node> Node::*handle>
struct IntrusiveList {
    Node *first;    

    static Node *next(Node *n) {
        auto h = (n->*handle).next;
    }
};

Usage example:

#include <iostream>

struct Test {
    IntrusiveListHandle<Test> handle;
    std::string value;

    Test(const std::string &v): value(v) {}
};

template<typename IntrusiveList>
void print(const IntrusiveList &list) {
    for (Test *n = list.first; n; n = list.next(n)) {
        std::cout << n->value << "\n";
    }
}

int main() {
    Test hello("hello");    
    Test world("world!");
    hello.handle.next = &world;
    IntrusiveList<Test, &Test::handle> list;
    list.first = &hello;
    print(list);
}

You should avoid undefined behaviour at all costs as compilers are getting smarter and smarter in exploiting UB for optimization - code that works fine now may suddenly break with the next compiler update.

I see that you mentioned reverse iteration. --end() would not work with this code, but the usual approach is to provide both a begin()/end() and an rbegin()/rend() pair to allow reverse iteration.

Jens · Answer 4 · 2015-12-07T16:23:20.270

1

I think you can achieve the benefits using CRTP:

#include <iostream>
using namespace std;

template<typename T>
struct ListNode
{
    ListNode<T>* next;

    // this would be nodeToItem in the list class
    T* value()
    {
        return static_cast<T*>(this);
    }
};

// This would be your abstract base class
struct A: public ListNode<A>
{
    A(int i): x(i) {}
    virtual ~A() = 0;
    int x;
};

inline A::~A() {}

struct B: public A
{
    B(int i): A(i) {}
    virtual ~B() {}
};

template<typename T>
class IntrusiveList {
public:
IntrusiveList(ListNode<T>* ptr): root(ptr) 
{
    ptr->next = nullptr;
}

void append(ListNode<T>* ptr)
{
    ptr->next = root;
    root = ptr;
}

ListNode<T>* begin() {return root;}
private:
ListNode<T>* root;
};

int main() {
    B b(10);
    B b2(11);
    IntrusiveList<A> l(&b);
    l.append(&b2);

    for(ListNode<A>* n=l.begin(); n != nullptr; n = n->next)
    {
         std::cout << n->value()->x << std::endl;
    }
    return 0;
}

Having the elements in more than one list should be possible by using an array of ListNode pointers in the struct, and passing the index of the array to the list class as either a template parameter or constructor argument. Iterator would also need to store the index in the ListNode array.

edited Dec 07 '15 at 16:23

answered Dec 07 '15 at 16:18

Jens

9,058
2
26
43

The OP's objections to that design were implied in the question and well explained in the comments before you posted that answer. – JSF Dec 07 '15 at 16:27
This is basically the base_hook version of the intrusive linked list, where the data class is derived from the list node. I'm looking for a solution given a member_hook, where the list node is a member of the data class. Also, given iteration as you show involving a `nullptr` end, the list root and all next pointers could just be `T*`s. – zennehoy Dec 07 '15 at 16:29
1

@JSF I haven't seen any discussion of that approach. The question just says he wants an intrusive list without UB, and I think the solution offers this. The comments then discuss details of the "offset-approach", but not CRTP. – Jens Dec 07 '15 at 16:30
I was using CRTP for years before I ever learned that it was called that. The discussion of having the node be a base class (rather than member) of T implied CRTP because it was still clear and necessary that the node be templated on T. In an existing discussion of a base class templated on the derived class, "CRTP" is just a buzzword, not a new answer. – JSF Dec 07 '15 at 18:20
@JSF: Since your goal in using member_hook is "avoid multiple inheritance", I'd like to point out that you can do that here too with a slight modification: `template struct ListNode : TBase` and then `struct A : Base` becomes `struct A : ListNode` – Ben Voigt Dec 07 '15 at 18:24
@BenVoigt, that wasn't my goal. I'm not the OP here, just someone else doing similar things. I think you missed the point in a couple ways and are getting even further from the OP's requirements (but that should be more for the OP to comment on than for me to further mind read the OP). – JSF Dec 07 '15 at 19:25
@JSF: I got that from bullet point #3 of the question (sorry for confusing you with OP) and it seems like a simple point so I'm surprised to hear you say I missed it. – Ben Voigt Dec 07 '15 at 19:28

Guillaume Racicot · Answer 5 · 2015-12-07T15:23:46.793

-1

You can Hardly get the original object with the pointer of one of it's member without invoking UB. Why you absolutely can't? Because a IntrusiveListNode can be hold anywhere. There is no clues that a specific IntrusiveListNode is held in a T and another proof you can't do that: The compiler can't know if the node sent to your function is really held in a T. What you are trying to do is undefined behaviour. The right way to do this would be to add a pointer to it's container in IntrusiveListNode.

template<typename T>
struct IntrusiveListNode {
    IntrusiveListNode * next_;
    IntrusiveListNode * prev_;
    T* item_;
};

template <typename T, IntrusiveListNode<T> T::*member>
class IntrusiveList {
// snip ...
private:
    T & nodeToItem_(IntrusiveListNode<T> & node) {
        return *(node->item_);
    }

    IntrusiveListNode<T> root_;
};

If you can't use template for IntrusiveListNode, you can use void* instead of T*

You can see an example of an implementation of an intrusive linked list here

edited Dec 07 '15 at 15:23

answered Dec 07 '15 at 14:58

Guillaume Racicot

39,621
9
77
141

1

The question doesn't ask to prove that the `IntrusiveListNode` is really a member of `T`, it assumes that it is, and an answer that has defined behaviour if and only if it really is would be wholly appropriate. Your approach works, but at the cost of an additional data member that should not be needed. Besides, by your logic, this is still totally horribly broken: how can you prove that the `T` takes care to set the list node's `item_` properly? You can't. You're assuming the class does this properly. Such assumptions are perfectly valid, but they're equally valid when they're made by the OP. – Dec 07 '15 at 15:03
This code I probably the best to do what the OP asked for, get rid of the undefined behaviour. He didn't ask for the `IntrusiveListNode` to be same size in the question, just the public interface to be the same. That's why I proposed to change `T*` to a `void*` to keep public interface the same. The downvote is not justified. – Guillaume Racicot Dec 07 '15 at 15:10
Quoting one of the OP's comments, when thinking that someone else was suggesting the same thing as in your answer: "Yes, that would be possible, and I have considered it, but it does increase the size of every list node by another pointer." I do believe the OP is looking for the `IntrusiveListNode` not to increase in size. – Dec 07 '15 at 15:18
Even after the edit, your basic point about the member vs outer class relationship would apply exactly the same way between a base class and outer class. In a static cast from base class to outer class, the compiler must trust the programmer that this base class was actually a part of the right outer class. The compiler's job in navigating from base class to outer object is identical to navigating from member to outer object. The problem seems to be in the language arbitrary rules, rather than anything wrong with the requested operation. – JSF Dec 07 '15 at 15:22
sorry, forgot to change the content of the function. – Guillaume Racicot Dec 07 '15 at 15:24
@JSF then I guess there in no possible answer for this question. – Guillaume Racicot Dec 07 '15 at 15:29
When someone with a lot more language lawyer credibility tells us there is no standards compliant way to do the job that Microsoft's `offsetof` macro is supposed to do, I won't have too much trouble believing it. But until then I'll translate your "no possible answer" into "neither of us know the answer". – JSF Dec 07 '15 at 15:35

score -2 · Answer 6 · answered Dec 15 '15 at 13:59

With templates it is hard to do. It is possible with macro's, so the needed members _next, _prev etc are in the scope of the object itself, not inside a separate template object. Using the macro's it is possible to avoid typing in code that is very similar every time. In fact I created a Case tool "ClassBuilder" (http://sourceforge.net/projects/classbuilder/) years ago that writes code using macro's to create data structures that uses intrusive linked lists. In the area I work, the normal templated linked list stuff is just way to slow. In our business, it is normal to work with very large in core data structures which are also very dynamic. Thus lots of removals and additions and searches on the lists. With the tool you totally abstracts from the actual implementation, you just create class diagrams and generate the code from there. In a relative simple test case we did, the run time performance of the generated code was 40s and 400s for a C++ solution using the "normal" STL kind of implementation. A C# implementation of the same test case was aborted after a few hours of running. Its implementation was similar to teh STL one, but this one was hit very hard by the Garbage Collector. Due to the dynamic behavior of the test case all memory that could be reclaimed could only be reclaimed in a full scan.

How to implement an intrusive linked list that avoids undefined behavior?

6 Answers6