4

XOR linked lists use pointer arithmetic in a way that looks suspicious to me given the changes in semantics introduced in C++17 (and discussed e.g. in Is a pointer with the right address and type still always a valid pointer since C++17?). Do they cause undefined behavior now? If so, can they be saved with launder?

EDIT:

The Wikipedia article contains just a short note about converting between pointers and integers. I tacitly assumed (and now am making it explicitly stated) that the pointers are first converted to and integer type of sufficient size to fit them, then XORing is done on the integers. The properties of XOR listed under Theory of operation thus guarantee that only integers obtained once from pointers will be converted back to them. The actual mapping from pointers to integers can be, per the standard, an arbitrary injection. I don't rely on any assumption beyond that.

Does the standard allow to use them and access the still existing objects? Before C++17? Since C++17?

ByteEater
  • 885
  • 4
  • 13
  • 5
    First question: Why would you want to? – tadman Jul 13 '23 at 22:08
  • As a particular C++ implementation, GCC [says](https://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-implementation.html) that you "may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic", but I'm unsure, if it's applicable to the situation with XOR. I think it very depends on the definition of "casting from pointer to integer and back again" – oficsu Jul 13 '23 at 22:10
  • In C I think you are fine if you use `uintptr_t`. Not sure if C++ is the same. – Nate Eldredge Jul 13 '23 at 22:43
  • 1
    @oficsu: That note is in reference to using arithmetic to produce a *different* pointer than the one you started with. In this case, the arithmetic gets us back the *same* pointer. – Nate Eldredge Jul 13 '23 at 22:43
  • The comments under the question claim that compilers didn't actually allow it (i.e. ignoring pointer provenance) before, so if it worked back then, it will surely work now. Side note: it seems the standard is slowly adjusting (relaxing) pointer/lifetime rules to match what the compilers are actually doing, not the other way around. – HolyBlackCat Jul 13 '23 at 22:50
  • I've edited the question and explained that I mean converting to and from integers and dereferencing as the operations involving pointers; the XORing is on integers (of a large enough type). – ByteEater Jul 14 '23 at 03:01
  • @tadman, it's an optimization, you keep one integer instead of two pointers at each node. Also, as a learning experience. – ByteEater Jul 14 '23 at 03:04
  • 1
    @tadman just to see if it can be done, of course :) – Jeremy Friesner Jul 14 '23 at 03:08

3 Answers3

5

It's implementation-defined, and still valid in C++17, at least for GCC. You cannot perform an xor operation between two pointers directly; you would have to go through reinterpret_cast<std::intptr_t>. The effect of this conversion (and back) is implementation-defined.

Implementation-defined means that the compiler must document what happens. What GCC provides is:

A cast from pointer to integer discards [...], otherwise the bits are unchanged.

A cast from integer to pointer discards [...], otherwise the bits are unchanged.

When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined.

See https://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-implementation.html

From this description, we can conclude that:

  1. the user ensures that the object at the address stays the same between storing pointers in an XOR list and retrieving them
    • if the pointed-to object changes during this process, casting the integer back to a pointer is undefined behavior
  2. converting one past the end pointers, null pointers, and invalid pointers back and forth isn't explained here, but "preserves the value" according to the C++ standard

Implications for an XOR-list

Generally, this should make XOR-lists implementable, as long as we reproduce the same pointers that we stored, and don't "rug pull" nodes while there are XORed pointers to them.

std::intptr_t ptr;

// STORE:
// - bit-cast both operands and XOR them
// - store result in ptr
ptr = reinterpret_cast<std::intptr_t>(prev) ^ reinterpret_cast<std::intptr_t>(next);

// LOAD:
// - XOR stored ptr and bit-cast to node*
node* next = reinterpret_cast<node*>(ptr ^ reinterpret_cast<std::intptr_t>(prev));
// valid dereference, because at the address 'next', we still store the same object
*next;

As stated in the documentation, next "must reference the same object as the original pointer", so we can assume that next is now a pointer to an object, if such a pointer was originally stored in ptr.

However, it would be UB if we stored the XORed next pointer, began the lifetime of a new object where next points to, and then un-XORed the address and converted back to a pointer type.

Jan Schultke
  • 17,446
  • 6
  • 47
  • 96
  • I've edited the question and explained that I mean converting to and from integers and dereferencing as the operations involving pointers; the XORing is on integers (of a large enough type). – ByteEater Jul 14 '23 at 03:01
  • @ByteEater I'm not sure what part of the question my answer doesn't address yet. Is there anything still unclear? – Jan Schultke Jul 14 '23 at 06:55
  • Your first sentence confused me. I thought you meant it only worked, and still does, in implementations which, like GCC, support some implementation-defined functionality beyond the standard (though allowed by it). – ByteEater Jul 14 '23 at 10:43
  • The remaining unclear part is: "if the pointed-to object changes during this process, casting the integer back to a pointer is undefined behavior". So if I had a pointer to `char` which equals `a`, cast the pointer to a sufficiently large integer type, changed the `char` value to `b`, then casting the integer back to a pointer would cause UB? – ByteEater Jul 14 '23 at 10:46
  • @ByteEater no, this would not be UB. Assignment only changes the value of an object, it doesn't replace the object. It would be UB if you started the lifetime of a new object in the same place, e.g. with `std::construct_at`. As for the first sentence: I cannot find documentation for MSVC and Clang which describes what happens here, so I can't say whether it's universally okay or not. – Jan Schultke Jul 14 '23 at 10:49
  • I've just understood, I think, what you actually mean by "implementation-defined" in your first sentence. Namely, that implementations aren't required to provide any integer types large enough to fit results of converting pointers. `uintptr_t` is optional. An extreme but still compliant possibility is that all pointer types are wider than all integer types, completely disabling conversion from pointers to integers. (If I'm wrong in any part of this reasoning, please correct me.) – ByteEater Jul 15 '23 at 11:16
  • @ByteEater I just meant that the conversion from integers to pointers and back is implementation-defined, with only the round-trip being valid if the integer is large enough. I suppose that it adds further issues if `std::uintptr_t` doesn't exist, but that is very unlikely. – Jan Schultke Jul 15 '23 at 12:26
  • Well, if there is a type that can fit results of conversion from pointers, then the actual mapping being implementation-defined isn't an issue. Whatever it is, the properties of XOR guarantee that the same integer is recreated which was converted from a pointer. If it's then converted back, it gives the same pointer. – ByteEater Jul 15 '23 at 21:02
  • @ByteEater it's just not so simple, even if no bits are lost and all the sizes are correct. For example, should `(T*) 0x123` be a pointer to the object at `0x123`, or should it be the one-past-the-end pointer for the object at `0x122`? Also, if the lifetime of a new object has begun at address `0x123`, will `0x123` use that "most recent" object once converted to a pointer? This would conflict with roundtrips preserving the value of a pointer. These two issues are the difficult questions here. – Jan Schultke Jul 16 '23 at 00:28
  • If I understand the standard correctly, a pointer to an object (either kept all the time as a pointer or transformed in a ny way that is guaranteed to give back the same pointer) cannot (on pain of UB) be used as a pointer to another object or a one-past-the-end (and vice versa), generally (though there are some exceptions). – ByteEater Jul 16 '23 at 18:27
  • Yes, it's not a syntactic restriction, but a semantic one, moreover, an undecidable one, which no compiler can have sound and complete way to detect (so soundness is required, but completeness just as best effort and quality of implementation matter). E.g. if you convert a pointer to an integer and then choose to convert back and dereference either this integer or some random one depending on a boolean obtained from an arbitrary computation, no alg. in the compiler can for all cases tell in advance (statically) what would happen (otherwise, the algorithm could be easily adapted to solve HP). – ByteEater Jul 16 '23 at 18:31
  • But it's not shocking to come across another way in which the question whether some code can casue UB is undecidable. – ByteEater Jul 16 '23 at 18:33
2

As far as I know, reinterpret_cast to and from std::uintptr_t should be fine.

DevSolar
  • 67,862
  • 21
  • 134
  • 209
Marshall Clow
  • 15,972
  • 2
  • 29
  • 45
  • I've edited the question and explained that I mean converting to and from integers and dereferencing as the operations involving pointers; the XORing is on integers (of a large enough type). – ByteEater Jul 14 '23 at 03:01
2

XOR linked lists are still valid in C++17 and beyond, assuming that the type uintptr_t exists.

While it's true that a pointer that represents a particular address is not necessarily a pointer to an object that resides at that address, there is [expr.reinterpret.cast]/5:

[...] A pointer converted to an integer of sufficient size (if any such exists on the implementation) and back to the same pointer type will have its original value; mappings between pointers and integers are otherwise implementation-defined. [ Note: Except as described in 6.7.4.3, the result of such a conversion will not be a safely-derived pointer value. — end note ]

What this tells us is that, while reinterpret_cast may not in general give you a pointer to an object, there is a special case where the operand is an integer value that was previously obtained from reinterpret_casting a pointer operand. The pointer value resulting from the round trip is the original pointer value, meaning that if the original pointer value pointed to an object, the result also points to that object (assuming the object still exists, of course).

But, the note tells us that, in C++17, the result of the conversion may not be a "safely-derived pointer value". What that means is that once you perform the pointer to integer conversion and do not keep a copy of that integer, but instead only store its value XOR other integers, the implementation is allowed to garbage collect the pointee because (in some technical sense that I won't get into here) the pointee is no longer "reachable". When you later reconstitute the original integer value by another XOR operation, and then attempt to reinterpret_cast it back to the original pointer, that pointer value is not "safely-derived" and is therefore considered invalid under some theoretical implementations (because the implementation might have garbage collected it already). But, if your implementation has "relaxed pointer safety", then this doesn't matter; the pointer is still valid. The design intent was that garbage-collected implementations would define themselves as having "strict pointer safety".

In practice, no implementation actually has "strict pointer safety" as specified in the standard, even though some garbage-collected C++ implementations do apparently exist. For this reason, the concept of strict pointer safety will be abolished in C++23. You can rest assured that XOR linked lists are valid, assuming that uintptr_t exists.

Brian Bi
  • 111,498
  • 10
  • 176
  • 312
  • So, theoretically, XOR linked lists are guaranteerd to work accordind to versions of the standard before C++11 and will be guaranteed again by the upcoming C++23 standard? – ByteEater Jul 14 '23 at 23:21
  • Does it matter if I keep a copy of the integer (even in a place that's not used for anything, particularly not for conversion back to a pointer)? Or do I have to keep a copy of the actual pointer on (theoretical) implementations with strict pointer safety? – ByteEater Jul 14 '23 at 23:23
  • @ByteEater In implementations that have strict pointer safety (and, as I said, no such implementations exist) it doesn't really matter whether you keep a copy of the pointer around, because even still, `reinterpret_cast`ing from the reconstituted integer value doesn't yield a "safely-derived pointer". The implication of this is that such an implementation could optimize based on the assumption that the resulting pointer is not used to access an object ... – Brian Bi Jul 15 '23 at 00:25
  • @ByteEater ... without having global knowledge of whether some other safely-derived pointer still exists out there somewhere that points to the object. – Brian Bi Jul 15 '23 at 00:25