-1

Say I have global variables defined in a TU such as:

extern const std::string s0{"s0"};
extern const std::string s1{"s11"};
extern const std::string s2{"s222"};
// etc...

And a function get_1 to get them depending on an index:

size_t get_1(size_t i)
{
    switch (i)
    {
        case 0: return s0.size();
        case 1: return s1.size();
        case 2: return s2.size();
        // etc...
    }
}

And someone proposes replacing get_1 with get_2 with:

size_t get_2(size_t i)
{
    return *(&s0 + i);
}
  1. Are global variables defined next to each other in a translation unit like this guaranteed to be stored contiguously, and in the order defined?
    • Ie will &s1 == &s0 + 1 and &s2 == &s1 + 1 always be true?
    • Or can a compiler (does the standard allow a compiler to) place the variables s0 higher than s1 in memory ie. swap them?
  2. Is it well defined behaviour to perform pointer arithmetic, like in get_2, over such variables? (that crucially aren't in the same sub-object or in an array etc., they're just globals like this)
    • Do rules about using relational operators on pointers from https://stackoverflow.com/a/9086675/8594193 apply to pointer arithmetic too? (Is the last comment on this answer about std::less and friends yielding a total order over any void*s where the normal relational operators don't relevant here too?)

Edit: this is not necessarily a duplicate of/asking about variables on the stack and their layout in memory, I'm aware of that already, I was specifically asking about global variables. Although the answer turns out to be the same, the question is not.

Arghnews
  • 69
  • 2
  • 7
  • 1
    I don't have references to the standard, so I'm not going to answer, but (1) Not guaranteed (2) No, pointer arithmetic that goes outside the bounds of an object or array is generally UB. – CoffeeTableEspresso Nov 22 '22 at 13:50
  • 2
    The C++ Standard only specifies/guarantees memory layout details for arrays and members of structures. Any pointer arithmetic like yours is undefined behaviour. – Adrian Mole Nov 22 '22 at 13:56
  • 1
    [There is simply no guarantee about the order of addresses of variables in C++ at all.](https://stackoverflow.com/a/73424358/12002570) – Jason Nov 22 '22 at 14:06
  • @mch Just asking about the case here, I'm aware it could be changed (this isn't the code anyway, just an example). Ty all for your answers. – Arghnews Nov 22 '22 at 14:07
  • [There is no requirement on the relationship between addresses of variables that are not part of the same array, or object. **They don't have to be contiguous**, or anything like that.](https://stackoverflow.com/a/63794312/12002570) – Jason Nov 22 '22 at 14:09
  • 3
    Curious why you didn't declare as `std::string s[] = { "s0", "s11", "s222" };` What's the point of creating variable names with numbers in them, when it seems pretty clear that you are treating the numerical part `0` of the variable name `s0` as an _array index_? – Wyck Nov 22 '22 at 14:11
  • Reopened. The claimed duplicates are about **stack** variables; they do not address the layout of globals. – Pete Becker Nov 22 '22 at 15:09
  • @Wyck this is some simplified sample code that mirrors the real code I'm looking at. You're right though, a refactor into an array would be better. But I nonetheless wanted to know if this was well defined behaviour/ask this question. – Arghnews Nov 22 '22 at 15:54

1 Answers1

2

Pointer arithmetic on disparate objects yields undefined behavior as per [expr.add]:

4 When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

(4.1) — If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.

(4.2) — Otherwise, if P points to an array element i of an array object x with n elements (9.3.4.5), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i + j of x if 0 ≤ i + j ≤ n and the expression P - J points to the (possibly-hypothetical) array element i − j of x if 0 ≤ i − j ≤ n.

(4.3) — Otherwise, the behavior is undefined.

Since s0 through s2 are not elements of an array, get_2 yields explicitly documented undefined behavior.

As far as I can tell, the standard puts no limits on the order in memory of these variables, so the compiler could order them any way it wanted, with any amount of padding or other variables between them. This is not explicitly mentioned as such, but as was pointed out to me in the comments, [expr.rel] and [expr.eq] determine that the results of relational operators in these cases are undefined/unspecified. In particular, [expr.eq] states about operators == and != that

(3.1) — If one pointer represents the address of a complete object, and another pointer represents the address one past the last element of a different complete object, the result of the comparison is unspecified.

and [expr.rel] about <, >, <=, >= that

4 The result of comparing unequal pointers to objects is defined in terms of a partial order consistent with the following rules:

(4.1) — If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.

(4.2) — If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member is required to compare greater provided the two members have the same access control (11.9), neither member is a subobject of zero size, and their class is not a union.

(4.3) — Otherwise, neither pointer is required to compare greater than the other.

Again, since s0, s1, s2 are not part of the same array and not members of the same object, 4.3 is relevant, and the results of comparing pointers to them is unspecified. In practical terms, this means that the compiler can order them in memory in an arbitrary fashion.

Wintermute
  • 42,983
  • 5
  • 77
  • 80