Aliasing struct and array the C++ way

Question

This is a C++ followup for another question of mine

In the old days of pre-ISO C, the following code would have surprised nobody:

struct Point {
    double x;
    double y;
    double z;
};
double dist(struct Point *p1, struct Point *p2) {
    double d2 = 0;
    double *coord1 = &p1->x;
    double *coord2 = &p2->x;
    int i;
    for (i=0; i<3; i++) {
        double d = coord2[i]  - coord1[i];    // THE problem
        d2 += d * d;
    }
    return sqrt(d2);
}

Unfortunately, this problematic line uses pointer arithmetic (p[i] being by definition *(p + i)) outside of any array which is explicitely not allowed by the standard. Draft 4659 for C++17 says in 8.7 [expr.add]:

If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 <= i + j <= n; otherwise, the behavior is undefined.

And the (non-normative) note 86 makes it even more explicit:

An object that is not an array element is considered to belong to a single-element array for this purpose. A pointer past the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical element x[n] for this purpose.

The accepted answer of the referenced question uses the fact that the C language accepts type punning through unions, but I could never find the equivalent in the C++ standard. So I assume that a union containing an anonymous struct member and an array would lead to Undefined Behaviour in C++ — they are different languages...

Question:

What could be a conformant way to iterate through members of a struct as if they were members of an array in C++? I am searching for a way in current (C++17) versions, but solutions for older versions are also welcome.

Disclaimer:

It obviously only applies to elements of same type, and padding can be detected with a simple assert as shown in that other question, so padding, alignment, and mixed types are not my problem here.

Why are you forced to do this instead of using a c++ specific or compliant solution? You didn't say that in your question. — Iharob Al Asimi, Jan 26 '18 at 14:42
you didn't consider `double *coord1 = &p1.x;` a problem? There's no promise from the compiler to not add any padding between your members, meaning there's no guarantee that coord1[1] is y.... hence the question. Derp — UKMonkey, Jan 26 '18 at 14:52
_"What could be a conformant way to iterate through members of a struct as if they were members of an array in C++?"_ There isn't one. That's why these are two separate constructs, not just one. Use the one appropriate for the task, end of — Lightness Races in Orbit, Jan 26 '18 at 15:08
@AnttiHaapala: I added the C tag, because the original code is old C, and I have confirmed that the C solution could not be used. But that last point could be a reason for removing it too... — Serge Ballesta, Jan 26 '18 at 15:13
@LightnessRacesinOrbit: The example code uses a Point object where the common way is to use different members for the different coordinates, **except** when computing a distance, where the array way requires less code duplication and is less error prone due to the DRY principle. I know there are tons of way to avoid that, but it used to be accepted in old C versions, I could find a way in C, and I just wondered how to use that in C++ — Serge Ballesta, Jan 26 '18 at 15:16
@SergeBallesta: I prefer to adhere to the "don't have UB in your program" principle - fortunately, so do you, which is why we're here :P — Lightness Races in Orbit, Jan 26 '18 at 15:18
*In the old days of pre-ISO C, the following code would have surprized nobody:* Maybe I wouldn't have been surprised, but I'd have called that crap back then, too. [Expletive Deleted] hacks aren't excused by time. What you've presented is simply horrible code that managed to "work" in spite of itself. It's not an array - why do you want to treat it as such? — Andrew Henle, Jan 26 '18 at 15:55
Does any C++ language-lawyer know if there's an equivalent to C11 6.3.2.3/7 in C++? If so, one can implement a hack according to this: `"When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object."` — Lundin, Jan 26 '18 at 16:04
@AndrewHenle: the original version of K&R C intended to allow programmers to directly use low level constructs. There was no standard to forbid aliasing, and the compiler just trusted the programmer to know what he had written. You could easily find worse code if you look at what we used to write in the early 80's... — Serge Ballesta, Jan 26 '18 at 16:05
@Lundin: Good question. I thought that 6.9 Types [basic.types] note 45 *The intent is that the memory model of C++ is compatible with that of ISO/IEC 9899 Programming Language C* explicitely allowed it. But Oliv have found the [core issue #1701](http://www.open-std.org/JTC1/SC22/WG21/docs/cwg_active.html#1701) which let think it would be at least uncertain — Serge Ballesta, Jan 26 '18 at 16:09
The simplest solution in my mind is to make `Point` contain an `array` to begin with, and use accessor methods to mimic `x`, `y`, and `z`. — jxh, Jan 26 '18 at 21:16
@jxh I agree. Moreover, nowadays, compilers are smart enough to generate the same assembly: https://godbolt.org/g/WBJV4e — Bob__, Jan 27 '18 at 16:08

Tobi · Accepted Answer · 2018-01-26T15:11:36.820

29

Use an constexpr array of pointer-to-member:

#include <math.h>

struct Point {
    double x;
    double y;
    double z;
};

double dist(struct Point *p1, struct Point *p2) {
    constexpr double Point::* coords[3] = {&Point::x, &Point::y, &Point::z};

    double d2 = 0;
    for (int i=0; i<3; i++) {
        double d = p1->*coords[i] - p2->*coords[i];
        d2 += d * d;
    }
    return sqrt(d2);
}

edited Jan 26 '18 at 15:11

answered Jan 26 '18 at 15:05

Tobi

2,591
15
34

1

Mmmmm tasty indirection – Lightness Races in Orbit Jan 26 '18 at 15:08
Nice idea. However, have you checked whether current compilers can successfully optimize away the member pointers? If this generated code that actually has to load the offset from memory, it would be a major performance drain. – cmaster - reinstate monica Jan 26 '18 at 15:13
1

@cmaster: clang gives the same assembly output for my code and the original C version https://godbolt.org/g/xhT1pq – Tobi Jan 26 '18 at 15:26
1

Really nice trick that is immune to possible padding while keeping simple to read and write, and does not require any change to a POD struct. – Serge Ballesta Jan 26 '18 at 15:46
1

And this produces the exact same [assembly](https://godbolt.org/g/K8r9Qr). – Oliv Jan 26 '18 at 18:36
2

@cmaster - Its a good point. But in real-world live code where you care **that** much about it compiling optimally, I don't see why the coder wouldn't instead redesign the class/struct to not have this issue in the first place.(eg: make it an array, or a struct with an array or something) – T.E.D. Jan 26 '18 at 19:35
@T.E.D. Well, the point is usually, that `someVector.z` is much more readable to most of us than `someVector[2]`, or even `someVector[z]` (if you had a constant `z` defined somewhere). There are simply some applications where you want your vectors to be arrays, and some other applications where you want your vectors to be structs. Been there, made a dirty hack around it, and not proud of the result. But I still believe that my dirty hack was still better than using array notation where not applicable, or struct notation where not applicable. This answer offers a much better solution. – cmaster - reinstate monica Jan 26 '18 at 20:44
1

@cmaster - I guess put me in your minority then. I don't think it makes a huge difference in readability, and honestly prefer the array form in **any** situation where the container contents are homogeneous for exactly these reasons. – T.E.D. Jan 26 '18 at 21:22
What does `double Point::* coords[3]` mean? – Nanashi No Gombe May 20 '20 at 15:41
@NanashiNoGombe Pointer to member; see https://stackoverflow.com/a/63284852/1968 – Konrad Rudolph Aug 06 '20 at 14:15

Jaa-c · Answer 2 · 2018-01-27T14:24:52.007

18

IMHO the easiest way is to just implement operator[]. You can make a helper array like this or just create a switch...

struct Point
{
    double const& operator[] (std::size_t i) const 
    {
        const std::array coords {&x, &y, &z};
        return *coords[i];
    }

    double& operator[] (std::size_t i) 
    {
        const std::array coords {&x, &y, &z};
        return *coords[i];
    }

    double x;
    double y;
    double z;
};

int main() 
{
    Point p {1, 2, 3};
    std::cout << p[2] - p[1];
    return 0;
}

edited Jan 27 '18 at 14:24

answered Jan 26 '18 at 14:52

Jaa-c

5,017
4
34
64

Here, you have to dynamically initialize three pointers for each call, of which you only dereference one at return. It is better use a static pointer-to-member array. – Tobi Jan 26 '18 at 15:09
If it's performance critical - yes. If not, I prefer code to be simple and readable. – Jaa-c Jan 26 '18 at 15:11
@Jaa-c if you want the code to be readable, then use class template argument deduction. – Guillaume Racicot Jan 26 '18 at 15:20
@Jaa-c I can provide you an edit that changes that if you wish – Guillaume Racicot Jan 26 '18 at 15:21
@Jaa-c I would edit to say they the performance of both are completely equivalent for optimized builds. If you look at the assembly output of the compiler, you'll see that it's pretty much the same. – Guillaume Racicot Jan 26 '18 at 15:26
@GuillaumeRacicot: I'm not sure how well it will optimize the first case... I'm on a cell phone, I'll check that later :) – Jaa-c Jan 26 '18 at 15:31
5

What makes you think that the second case is better than the first? In the second case, GCC-7.2 [is not able to elide the extra loads on `coords`](https://godbolt.org/g/XpdTvQ) – sbabbi Jan 26 '18 at 15:46
1

@sbabbi that's pretty nasty, should this be reported as a bug? – Guillaume Racicot Jan 26 '18 at 16:28
@GuillaumeRacicot well i guess so. clang-5.0 generates identical code in both cases. – sbabbi Jan 26 '18 at 16:44
Even with the second code, the generated assembly is still not optimal. It is better to use implemetation defined "pointer" arithmetic as in my answer. [Assembly comparison](https://godbolt.org/g/vmJLjX). Optimizers always have difficulties to optimize away table look up. – Oliv Jan 26 '18 at 18:09

Yakk - Adam Nevraumont · Answer 3 · 2018-01-26T15:54:04.893

2

struct Point {
  double x;
  double y;
  double z;
  double& operator[]( std::size_t i ) {
    auto self = reinterpret_cast<uintptr_t>( this );
    auto v = self+i*sizeof(double);
    return *reinterpret_cast<double*>(v);
  }
  double const& operator[]( std::size_t i ) const {
    auto self = reinterpret_cast<uintptr_t>( this );
    auto v = self+i*sizeof(double);
    return *reinterpret_cast<double const*>(v);
  }
};

this relies on there being no packing between the doubles in your `struct. Asserting that is difficult.

A POD struct is a sequence of bytes guaranteed.

A compiler should be able to compile [] down to the same instructions (or lack thereof) as a raw array access or pointer arithmetic. There may be some problems where this optimization happens "too late" for other optimzations to occur, so double-check in performance sensitive code.

It is possible that converting to char* or std::byte* insted of uintptr_t would be valid, but there is a core issue about if pointer arithmetic is permitted in this case.

edited Jan 26 '18 at 15:54

answered Jan 26 '18 at 14:58

Yakk - Adam Nevraumont

262,606
27
330
524

I am curious, where is it allowed in the standard to do pointer arithmetic on `char*` within a `struct`? I often ask if it is allowed, I systematicaly get a negative answer. – Oliv Jan 26 '18 at 15:18
@Oliv: You can always to pointer arithmetics inside an object. It is the way to retrieve the different bytes of the current representation. – Serge Ballesta Jan 26 '18 at 15:26
@SergeBallesta, This is surprising this paragraph of the standard [\[expr.add\]/4](https://timsong-cpp.github.io/cppwp/expr.add#4), defines pointer arithmetic inside an array(or a single object viewed as an array of one element) and finish by *otherwise, the behavior is undefined*. Why do you say it is allowed? – Oliv Jan 26 '18 at 15:33
@Oliv: 6.9 Types [basic.types] says *4 The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T)*. In addition, my class is trivially copyable, which means that its value can be transported with `memcpy` which merely sequential copies bytes. – Serge Ballesta Jan 26 '18 at 15:41
@SergeBallesta As I said in the comment bellow my question, this is not clear. This is [core issue #1701](http://www.open-std.org/JTC1/SC22/WG21/docs/cwg_active.html#1701). – Oliv Jan 26 '18 at 15:47

Oliv · Answer 4 · 2018-01-26T18:09:58.547

0

You could use the fact that casting a pointer to intptr_t doing arithmetic and then casting the value back to the pointer type is implemetation defined behavior. I believe it will work on most of the compilers:

template<class T>
T* increment_pointer(T* a){
  return reinterpret_cast<T*>(reinterpret_cast<intptr_t>(a)+sizeof(T));
  }

This technic is the most efficient, optimizers seems not to be able to produce optimal if one use table look up: assemblies-comparison

edited Jan 26 '18 at 18:09

answered Jan 26 '18 at 15:26

Oliv

17,610
1
29
72

@SergeBallesta This is your solution without UB arithmetic pointer, Yakk use the same problematic UB pointer arithmetic. While I believe in its case this is more due to the fact that the term "object representation" is maybe underspecified. – Oliv Jan 26 '18 at 15:35
@SergeBallesta But the definition of UB in [intro.defs] *Undefined behavior may be expected when this document omits any explicit definition*, makes Yakk solution UB. – Oliv Jan 26 '18 at 15:40
1

`reinterpret_cast(a)` should be `reinterpret_cast(a)` ;) And `uintptr_t` avoids overflow possibilities from doing math on it (overflow of `intptr_t` is UB, and there is no guarantee that the `intptr_t` you get will be far from its overflow point) – Yakk - Adam Nevraumont Jan 26 '18 at 15:49
@Yakk I prefer intptr_t because optimizer deals better with signed integers (due to the offered overflow possibility). `intptr_t` are large enough to store the value of any pointer by definition. What happens after is implementation defined behavior. So Yakk, find an implementation where what you say could happen!! Can you imagine one second that part of an object is at the top of the address space and the other part at the bottom? – Oliv Jan 26 '18 at 15:55
1

@oliv it is large enough to store any pointer, but there is no guarantee that given two pointers you can take the difference, or you can add to one pointer and reach the other. Using 32 bit pointers, assume the object is located at 0x7ffffff8 (aka, max_int+1-sizeof(double)). Then the second element is at 0x80000000, aka min_int. Adding sizeof(double) to the address of first element causes signed overflow. Both are still valid intptr_ts. If you used uintptr_ts, no problem. – Yakk - Adam Nevraumont Jan 26 '18 at 16:20
@Yakk What are you talking about? The double object is a suboject of a struct. Obviously it is inherintently to go out of a contiguous region of storage associated to a complete object. But here the QO is doing pointer arithmetic inside a complete object. – Oliv Jan 26 '18 at 17:29
@Yakk What you have done is a well known technic established by Schopenhauer in its book [The Art of Being Right](https://en.wikipedia.org/wiki/The_Art_of_Being_Right), the 3rd technic to make believe other that a right statement is wrong: "Generalize Your Opponent's Specific Statements". – Oliv Jan 26 '18 at 17:31
2

@oliv 07ffffff8 is contignous with 0x80000000. They are 8 units apart. Are you claiming that if a and b are pointers such that a+(b-a) == b then the same must be true of `intptr_t`? Can you cite that claim? Ie, you are claiming that no object in C++ can have addresses stradling the positive/negative edge of `intptr_t`. If this is the case, I was not aware of such a requirement in the standard, and it is unrelated to a requirement that all addresses be represented in `intptr_t`. – Yakk - Adam Nevraumont Jan 26 '18 at 18:46
@Yakk I am indeed claiming this last thing. Do you know any? Whatsoever, we are talking about implementation defined behavior no?? If there is such a plateform so this is not allowed on it. But realy *you will not be able to find any plateform that would do it*. – Oliv Jan 26 '18 at 21:16

Aliasing struct and array the C++ way

Question:

Disclaimer:

4 Answers4

Linked