2

Does the following snippet utilize undefined/unspecified/etc. behavior?

#include <cstddef>
#include <iostream>
#include <string>

class Test {
    std::string s1{"s1"}, s2{"s2"};
    std::ptrdiff_t offset = (char*)(&s2) - (char*)(this);
public:
    std::string& get() { return *(std::string*)((char*)(this) + offset); }
};

int main() {
    Test test;
    std::cout << Test{test}.get(); // note the copy
}

The purpose of that offset is pointing at either s1 or s2 (chosen at runtime) and containing no special logic for copying/moving/accessing. std::string here is just an example of a non-trivial-anything class.

passing_through
  • 1,778
  • 12
  • 24
  • I don't understand what you're trying to solve. If you're having a "sentry" value anyways, just make that sentry (your offset member) a pointer in its own right that points directly to what you want to access. Otherwise, isn't this just a weird riff on the [offsetof](https://en.cppreference.com/w/cpp/types/offsetof) macro, with possibly not the right pointer arithmetic? – Kevin Anderson Mar 25 '21 at 13:27
  • 3
    There's the [`offsetof`](https://en.cppreference.com/w/cpp/types/offsetof) macro for this. But it is only conditionally supported for types which are not standard layout (and your class is not standard layout). – interjay Mar 25 '21 at 13:27
  • Instead of `std::ptrdiff_t offset = (char*)(&s2) - (char*)(this);`, use `std::string * alias = &s2;` and then `get` becomes `return *alias;` – NathanOliver Mar 25 '21 at 13:29
  • @NathanOliver however, the pointer would have to be reset after a move/copy, what I want to avoid. – passing_through Mar 25 '21 at 13:31
  • 3
    @passing_through Use a member pointer instead of an object pointer. Then, it doesn't need to be reset. – François Andrieux Mar 25 '21 at 13:32
  • @FrançoisAndrieux Was just about to suggest that. OP, see this for conformation that it works: https://stackoverflow.com/questions/29125085/class-copy-constructor-and-pointer-to-member-functions – NathanOliver Mar 25 '21 at 13:33
  • @KevinAnderson I'm trying to make a sentry pointer of that kind with no need to reset it after copying/moving. – passing_through Mar 25 '21 at 13:34
  • @interjay yes, the non-stanard layout is intended. – passing_through Mar 25 '21 at 13:41
  • The getter is already operating on the current `this` instance, so why not merely `std::string& get() { return flag ? s1 : s2; }`? – Eljay Mar 25 '21 at 15:06
  • @Eljay I'm looking for a solution "containing no special logic for copying/moving/accessing". So, branching in the getter is not welcome. – passing_through Mar 25 '21 at 15:39
  • Ah, ok. `*(std::string*)((char*)(this) + offset)` looked like *on par* special logic to me, as well as being undefined behavior. – Eljay Mar 25 '21 at 15:42
  • @Eljay well, my bad :) It is (was supposed to be) just either `this->s1` or `this->s2` previously (!) chosen at runtime, so we don't get something like an `if` per access when using a similar pattern for, say, iterating/indexing. – passing_through Mar 25 '21 at 15:48
  • Change my imagined `bool flag` member to `std::string* selected_s` and then `std::string& get() { return *selected_s; }` Implement the appropriate copy ctor & assignment operator. Boom, done! Or what François suggested. – Eljay Mar 25 '21 at 15:52
  • @Eljay I rejected the self-referencing member idea because it required user-defined copying and moving. François's suggestion is what I like most; let's see if anything else comes up. – passing_through Mar 25 '21 at 16:08
  • 1
    Does this answer your question? [Is it UB to access a member by casting an object pointer to \`char \*\`, then doing \`\*(member\_type\*)(pointer + offset)\`?](https://stackoverflow.com/questions/62329008/is-it-ub-to-access-a-member-by-casting-an-object-pointer-to-char-then-doing) – Language Lawyer Mar 25 '21 at 16:15

2 Answers2

4

Your proposed solution contains multiple instances of Undefined Behavior related to pointer arithmetic.

First (char*)(&s2) - (char*)(this) is Undefined Behavior. This expression is governed by expr.add#5. Since the pointers aren't nullptr and they don't point to elements in the same array, the behavior is undefined.

Second ((char*)(this) + offset) is Undefined Behavior. This time the applicable paragraph is expr.add#4. Since (char*)(this) isn't an element of an array, the only legal value for offset would be 0. Any other value is Undefined Behavior.

But C++ already provides the tool necessary to solve the problem you are describing : pointer to data member. These pointers point to a member of a type instead of a member of an instance. It can be combined with a pointer to an instance (in this case a this pointer) to get a normal object pointer.

Here is your example modified to use a pointer to data member (https://godbolt.org/z/161vT158q) :

#include <cstddef>
#include <iostream>
#include <string>

class Test {
    std::string s1{"s1"}, s2{"s2"};

    // A pointer to an `std::string` member of the type `Test`
    using t_member_pointer = std::string Test::*;

    // Points to `Test::s2`
    t_member_pointer s_ptr = &Test::s2;

public:
    std::string& get() { 
        // Combine the data member pointer with an instance to get an object
        return (this->*s_ptr);
    }
};

int main() {
    Test test1;
    Test test2 = test1;
    std::cout << test2.get(); // note the copy
}

Notice that s_ptr points to Test::s2 and not this->s2. The value of a data member pointer is independent of any instance, it is compatible with any instance of that type. It therefore does not need to be corrected during copy or move, it will behave as expected if simply copied by value between instances.

François Andrieux
  • 28,148
  • 6
  • 56
  • 87
  • 1
    "*Since `(char*)(this)` isn't an element of an array*" - well, technically any variable/object instance can be treated as an array of 1 element, where pointer arithmetic is concerned. – Remy Lebeau Mar 25 '21 at 16:28
  • _Since (char*)(this) isn't an element of an array, the only legal value for offset would be 0_ Since `(char*)(this)` doesn't point to an object of type `char`, even 0 is not a legal offset, see [expr.add]/6 – Language Lawyer Mar 25 '21 at 16:40
2

No, the difference between two pointers is valid only for pointers from the same array:

Only pointers to elements of the same array (including the pointer one past the end of the array) may be subtracted from each other.

https://en.cppreference.com/w/cpp/types/ptrdiff_t

This doesn't hold for different members of a class.

Jeffrey
  • 11,063
  • 1
  • 21
  • 42
  • 4
    Note that this isn't a property of `ptrdiff_t`, it is a fact about pointer arithmetic in general. – François Andrieux Mar 25 '21 at 13:23
  • @FrançoisAndrieux so, is there no legit way to implement the idea via standard C++? – passing_through Mar 25 '21 at 13:25
  • 1
    @passing_through No, put your members in the same array or create a separate array of pointers to the relevant members. Consider a `constexpr` member pointer array. – François Andrieux Mar 25 '21 at 13:27
  • there is some macro voodoo [`offsetof`](https://en.cppreference.com/w/cpp/types/offsetof) though I never understood if or how it can be used for such case – 463035818_is_not_an_ai Mar 25 '21 at 13:29
  • 1
    @largest_prime_is_463035818 As far as I understand it, the uses for `offsetof` are limited. I don't believe you can use it to get back the member. Edit : It will give you the correct integer to add to `this`, but then trying to perform that pointer arithmetic should be UB, so knowing the offset is not usually useful. – François Andrieux Mar 25 '21 at 13:30
  • @largest_prime_is_463035818 • the macro-voodoo in `offsetof` leverages insider knowledge of the compiler implementation for the platform. So what may appear to be undefined behavior in the macro is relying on intimate compiler implementation details. There are also limits on the use-cases where `offsetof` is allowed, and if those are violated the voodoo may also result in undefined behavior. – Eljay Mar 25 '21 at 15:03
  • @Eljay yes, I am fine with `offsetof` making use of implementation details, what I don't understand is what it can be used for, seems like not much. Maybe some extra rules for standard layout or something like that apply – 463035818_is_not_an_ai Mar 25 '21 at 15:07
  • @largest_prime_is_463035818 • Ahh, I misunderstood your gist. Yes, I agree with you on that point. – Eljay Mar 25 '21 at 15:07