7

I have an array of structs, and I have a pointer to a member of one of those structs. I would like to know which element of the array contains the member. Here are two approaches:

#include <array>
#include <string>

struct xyz
{
    float x, y;
    std::string name;
};

typedef std::array<xyz, 3> triangle;

// return which vertex the given coordinate is part of
int vertex_a(const triangle& tri, const float* coord)
{
    return reinterpret_cast<const xyz*>(coord) - tri.data();
}

int vertex_b(const triangle& tri, const float* coord)
{
    std::ptrdiff_t offset = reinterpret_cast<const char*>(coord) - reinterpret_cast<const char*>(tri.data());
    return offset / sizeof(xyz);
}

Here's a test driver:

#include <iostream>

int main()
{
    triangle tri{{{12.3, 45.6}, {7.89, 0.12}, {34.5, 6.78}}};
    for (const xyz& coord : tri) {
        std::cout
            << vertex_a(tri, &coord.x) << ' '
            << vertex_b(tri, &coord.x) << ' '
            << vertex_a(tri, &coord.y) << ' '
            << vertex_b(tri, &coord.y) << '\n';
    }
}

Both approaches produce the expected results:

0 0 0 0
1 1 1 1
2 2 2 2

But are they valid code?

In particular I wonder if vertex_a() might be invoking undefined behavior by casting float* y to xyz* since the result does not actually point to a struct xyz. That concern led me to write vertex_b(), which I think is safe (is it?).

Here's the code generated by GCC 6.3 with -O3:

vertex_a(std::array<xyz, 3ul> const&, float const*):
    movq    %rsi, %rax
    movabsq $-3689348814741910323, %rsi ; 0xCCC...CD
    subq    %rdi, %rax
    sarq    $3, %rax
    imulq   %rsi, %rax

vertex_b(std::array<xyz, 3ul> const&, float const*):
    subq    %rdi, %rsi
    movabsq $-3689348814741910323, %rdx ; 0xCCC...CD
    movq    %rsi, %rax
    mulq    %rdx
    movq    %rdx, %rax
    shrq    $5, %rax
John Zwinck
  • 239,568
  • 38
  • 324
  • 436

4 Answers4

8

Neither is valid per the standard.


In vertex_a, you're allowed to convert a pointer to xyz::x to a pointer to xyz because they're pointer-interconvertible:

Two objects a and b are pointer-interconvertible if [...] one is a standard-layout class object and the other is the first non-static data member of that object [...]

If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_­cast.

But you can't do the cast from a pointer to xyz::y to a pointer to xyz. That operation is undefined.


In vertex_b, you're subtracting two pointers to const char. That operation is defined in [expr.add] as:

If the expressions P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i − j; otherwise, the behavior is undefined

Your expressions don't point to elements of an array of char, so the behavior is undefined.

Community
  • 1
  • 1
Barry
  • 286,269
  • 29
  • 621
  • 977
  • 1
    Regarding `vertex_b()`, see the standard here: https://stackoverflow.com/a/37119041/4323 - it says "If a program attempts to access the stored value of an object through [...] other than one of the following types the behavior is undefined [...] - a char or unsigned char type." I am certain this means that reading a byte from any object after reinterpret_casting to `char*` is valid. So given that the cast is valid, and reading chars from the resulting `char*` is valid, I think this should satisfy your `[expr.add]` requirement. What do you think? – John Zwinck Jun 08 '17 at 01:30
  • @John None of that turns what you're pointing to into an array of `char`. Since there is no array that these pointers index into, the subtraction isn't defined. – Barry Jun 08 '17 at 02:47
  • OK, so what you're saying is that arithmetic on two char pointers is never OK if the objects they point to were not originally typed as char. Is that right? And you're saying this produces UB? That would be pretty surprising given how common this sort of thing is in networking code (which of course is the same sort of code that takes advantage of the right to cast anything to char* in the first place). – John Zwinck Jun 08 '17 at 04:04
  • Also, do you have any alternative implementation that you think is completely legal? – John Zwinck Jun 08 '17 at 04:21
  • @John Not just originally typed as char, but char *array*. The wording requires an array. – Barry Jun 08 '17 at 13:01
  • 1
    @Barry: Both C and C++ existed and were in wide use before the standards were written. The ability to treat objects in C, and PODS in C++, as sequences of character-type values has *always* been fundamental to both languages. Since the authors of the C Standard explicitly recognize that it does not mandate everything necessary to make an implementation be useful for any purpose, and the C++ Standard relies upon key aspects of the C Standard, anyone wanting to produce a *useful* implementation must support such behaviors whether or not the exact wording of the Standard would mandate support. – supercat Jun 08 '17 at 15:20
  • @supercat I don't know what you guys want. The question is asking what the standard says, I am telling you what the standard, quite clearly and unambiguously, says: subtracting arbitrary char pointers is undefined. – Barry Jun 08 '17 at 16:10
  • @Barry: Unless I'm misreading things, the pointers both identify bytes within the same array object. If `xyz` were a PODS, the byte representation of an array of `xyz` is the concatenation of the byte representations of the `xyz`s therein. While the presence of a `string` within `xyz` would mean it's not a PODS, the principle that `(char*)(ptr+x)==((char*)ptr)+x*sizeof *ptr` should still hold. – supercat Jun 08 '17 at 16:18
  • @supercat That's not how the C++ object model works. – Barry Jun 08 '17 at 16:19
  • @Barry: The C++ object model certainly allows for non-PODS objects to contain pointers or other links to objects which in turn contain links back to those non-PODS objects, in Unspecified fashion. If the `sizeof` operator doesn't specify the stride of arrays of non-PODS objects, what does the indicated value mean? – supercat Jun 08 '17 at 16:23
  • `char[3*sizeof(xyz)];` in combination with appropriately adjusted placement new (for simplicity let's consider byte alignment covered already), shouldn't this then suffice to comply with the standard (word by word)? – Aconcagua Jul 03 '17 at 16:08
  • Oh, my, even simpler: couldn't we just cast both pointers to `uintptr_t` to get the difference? This is legal, and as both pointers originate from contigous memory, the difference should be meaningful... – Aconcagua Jul 03 '17 at 16:17
  • @Aconcagua: The only guarantee about `uintptr_t` is that converting a pointer to `uintptr_t` will yield a number that, when converted back to a pointer, will compare equal to the original. Although many implementations will guarantee that `(uintptr_t)(p+i) == ((uintptr)p) + (i*sizeof(*p))`, the Standard does not guarantee such behavior, nor even define a testable means (e.g a `__STDC_LINEAR_POINTERS` macro) by which implementations can promise it. – supercat Jul 04 '17 at 19:45
  • @supercat So in the end: It is totally legal, but if it works as desired is implementation defined. In other words: No undefined, but unspecified behaviour, so in the sense of the question, valid code at least... – Aconcagua Jul 05 '17 at 07:58
  • @Aconcagua: Many implementations--probably 90%+--define the behavior the same way, but there is no "machine-readable" way of identifying the exceptions. – supercat Jul 05 '17 at 14:27
  • @supercat Point is: It might not be *fully portable* any more, but at least it is *valid*... – Aconcagua Jul 05 '17 at 14:39
  • @Aconcagua: If code relies upon platforms to define behaviors a certain way, but will compile cleanly on conforming implementations where it won't work, such code will be valid on implementations that define the behaviors they need and invalid on those that don't. Nothing in the Standard should in any way imply that a non-portable program that relies upon Undefined Behavior is in any way "invalid", unless one believes that the phrase "non-portable or erroneous" means "erroneous" even when applied to non-portable programs. – supercat Jul 05 '17 at 15:00
  • @Aconcagua: To be sure, the notion that "non-portable or erroneous" is synonymous with "erroneous' seems to be contagious among implementation writers who are more concerned with how well their compiler can process portable C programs for that can be done in portable C, rather than in how well the compiler would be able to perform tasks (including those not possible in portable C) using code written to exploit the target platform. – supercat Jul 05 '17 at 15:04
4

vertex_a indeed breaks the strict aliasing rule (none of your floats are valid xyzs, and in 50% of your example they're not even at the start of an xyz even if there's no padding).

vertex_b relies on, shall we say, creative interpretation of the standard. Though your cast to const char* is sound, performing arithmetic with it around the rest of the array is a little more dodgy. Historically I've concluded that this kind of thing has undefined behaviour, because "the object" in this context is the xyz, not the array. However, I'm leaning towards others' interpretation nowadays that this will always work, and wouldn't expect anything else in practice.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
3

vertex_b is completely fine. You only maybe need to refine return offset / sizeof(xyz); since you're dividing std::ptrdiff_t with std::size_t and implicitly casting the result into int. By book, this behavior is implementation defined. std::ptrdiff_t is signed and std::size_t unsigned and result of division might be larger than INT_MAX (very unlikely) with huge array size on some platforms/compilers.

To cast away your worries, you can put assert()s and/or #errors which check PTRDIFF_MIN, PTRDIFF_MAX, SIZE_MAX, INT_MIN and INT_MAX, but I personally would not bother so much.

BJovke
  • 1,737
  • 15
  • 18
  • What about `vertex_a()`? It was suggested that it breaks strict aliasing, but I don't see how because it does not dereference the pointer. – John Zwinck Jul 05 '17 at 04:13
  • `vertex_a` is wrong if `coord` can point to member `y` of `xyz`. The main idea behind pointer arithmetic, since it appeared (in C), is to point to elements, not arbitrary memory location (these are equal for one byte element size). And `coord` may not point to start of `xyz`, you're even allowing a check for any value of `coord`. – BJovke Jul 05 '17 at 08:46
  • Memory alignment in this case is broken if multiple of required alignment is not equal to `float` size. While many CPUs will allow you to store this address into CPU register and will not produce error until you try to read or write to it, there's no such guarantee. Some CPUs (microcontrollers) might even not have first N lower bits of address register at all due to design simplification and might not even have instruction to insert such an address into register because that instruction will probably not have the first N lower bits also. – BJovke Jul 05 '17 at 08:54
1

Perhaps a more robust approach would involve changing the type signature to xyz::T* (T is a template argument so you can take xyz::x or xyz::y as needed) instead of float*

Then you can use offsetof(struct xyz,T) to confidently compute the location of the start of the struct in a way that should be more resilient to future changes in its definition.

Then the rest follows as you are currently doing: once you have a pointer to the start of the struct finding its offset in the array is a valid pointer subtraction.

There is some pointer nastiness involved. But this is an approach that is used. e.g. see the container_of() macro in the linux kernel. https://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/067/6717/6717s2.html

Jesse Cohen
  • 4,010
  • 22
  • 25