Does (p+x)-x always result in p for pointer p and integer x in gcc linux x86-64 C++

Question

Suppose we have:

char* p;
int   x;

As recently discussed in another question, arithmetic including comparison operations on invalid pointers can generate unexpected behavior in gcc linux x86-64 C++. This new question is specifically about the expression (p+x)-x: can it generate unexpected behavior (i.e., result not beingp) in any existing GCC version running on x86-64 linux?

Note that this question is just about pointer arithmetic; there is absolutely no intention to access the location designated by *(p+x), which obviously would be unpredictable in general.

The practical interest here is non-zero-based arrays. Note that (p+x) and the subtraction by x happen in different places in the code in these applications.

If recent GCC versions on x86-64 can be shown to never generate unexpected behavior for (p+x)-x then these versions can be certified for non-zero-based arrays, and future versions generating unexpected behavior could be modified or configured to support this certification.

UPDATE

For the practical case described above, we could also assume p itself is a valid pointer and p != NULL.

"can generate UB in gcc linux x86-64 C++" - It does not *generate* UB. It *is* UB in the C++ the language - the compiler and platform is irrelevant as far as whether it is UB or not. — Jesper Juhl, Mar 03 '19 at 10:50
"Specifically, if recent GCC versions on x86-64 can be shown to never generate UB for (p+x)-x then these versions can be certified for non-zero-based arrays". I guess I'm the only one who's too dumb to understand this? — Johannes Schaub - litb, Mar 03 '19 at 10:51
@Johannes Schaub What I meant is, if GCC has a great feature (specifically, math on invalid pointers) that's not currently named, we can give it a name, and then demand it in future versions too. — personal_cloud, Mar 03 '19 at 10:52
You are still completely misunderstanding what UB means. (Yes, I read all your comments in your other question.) The UB is *in the source code*. It isn’t generated by the compiler or the system. — prl, Mar 03 '19 at 10:54
It is true that a compiler can define behavior for language constructs that the language standard says are undefined, and in fact gcc has a lot of extensions that fall in this category. But I don’t think gcc defines any behavior for pointer arithmetic that goes beyond what the language standards define. — prl, Mar 03 '19 at 10:57
@prl OK OK I've scrubbed the phrase "UB" from the question. I meant *unexpected behavior*. Right, GCC possibly hasn't given the feature a name yet... the question is about its current implementation. — personal_cloud, Mar 03 '19 at 11:03
Gcc is sufficiently complex that it is very difficult to determine that it cannot have unexpected behavior in any situation involving undefined behavior. I certainly wouldn’t rule it out for the conditions you’re describing. — prl, Mar 03 '19 at 11:07
Have you looked into creating a non-zero based array abstract data type? — Galik, Mar 03 '19 at 11:12
@Galik Yes, that was discussed [here](https://stackoverflow.com/questions/54951999/). Somebody posted (and deleted) a good answer involving `intptr_t`; there were other answers that did not achieve the desired efficiency (one register to represent the array). I think the reason that question was downvoted was because I did not specify gcc linux x86-64. I should open a new one... C++ non-zero-based-arrays on gcc linux x86-64. — personal_cloud, Mar 03 '19 at 11:16
If there were interest in supporting non-zero-based arrays in gcc, it would surely be done by adding a specific feature, not by generally defining the behavior you’re asking about. — prl, Mar 03 '19 at 11:17
@prl That depends on whether existing GCCs already support it. If so, then they have the behavior I'm asking about, without specifically supporting non-zero-based arrays. So there are potentially 2 paths to non-zero-based arrays. — personal_cloud, Mar 03 '19 at 11:19
Yes, current implementations may have the behavior you want (although I doubt it), but the maintainers are never going to agree to document it and thus guarantee dependable support for it. — prl, Mar 03 '19 at 11:22
@prl Out of curiosity, why do you doubt that `(p+x)-x` would always behave as I expect? Also, I agree with your other point about documenting it being unlikely unless/until GCC supports non-zero-based arrays, but there's a chicken-and-egg problem: if nobody admits to using the feature, it is *certain* not to get documented. You could maybe create a proper version with `intptr_t` but that has its own risks perhaps (typecasts...). — personal_cloud, Mar 03 '19 at 11:33
@personal_cloud That doesn't look like an abstract data type, more like dealing with an ofsetted pointer. Do you even need to use a pointer at all? Can't you just use a class object with `operator[]` defined? — Galik, Mar 03 '19 at 11:43
@Galik Sure, but in the implementation of that class, how do you do the `array[i]` operation while using only a single register for `array`, if `array` is not zero-based? — personal_cloud, Mar 03 '19 at 11:48
The compiler should do that for you by caching the class' internal pointer in a register and doing the offset math accordingly with the offsetted index. I guess you have to trust the compiler is good at optimizing these things. — Galik, Mar 03 '19 at 11:52
@Galik In many cases, yes I agree, that should work. But I don't think the compiler is allowed to change the structures in memory, as there are linking standards etc. So I am afraid that on a *cold* fetch of the class it would have to retrieve both `allocated_zero_based_array` and the `offset` and add them, to populate the cache you describe. — personal_cloud, Mar 03 '19 at 11:55
The CPU has to add something whatever scheme you devise. How is adjusting the index (my scheme) different from adjusting the pointer (your scheme?)? — Galik, Mar 03 '19 at 11:58
@Galik The idea is to use an existing index across many subarray tiles, each implementing a section of a large, virtual array. Very common idea in graphics. So in your scheme the index would have to be adjusted separately for each subarray. — personal_cloud, Mar 03 '19 at 21:22
@JesperJuhl: your first comment is a slight overstatement for the general case: it's possible for a C implementation to define behaviour that ISO C leaves undefined. e.g. `gcc -fwrapv` defines signed overflow as 2's complement wrap-around. `gcc -fno-strict-aliasing` defines the behaviour of `uint32_t float_bits = *(int*)&my_float;`. But in this specific case, GCC doesn't go out of its way to define the behaviour of math on pointers outside of objects. It usually works the way you expect anyway, but that doesn't mean it's not technically UB. (Like I answered on the OP's earlier question) — Peter Cordes, Mar 03 '19 at 21:53

prl · Answer 1 · 2019-03-03T11:40:29.397

1

Here’s a list of gcc extensions. https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html

There is an extension for pointer arithmetic. Gcc allows performing pointer arithmetic on void pointers. (Not the extension you’re asking about.)

So, gcc treats the behavior for the pointer arithmetic you’re asking about as undefined under the same conditions as described in the language standard.

You can look through there and see if there is anything I missed that’s relevant to your question.

edited Mar 03 '19 at 11:40

answered Mar 03 '19 at 11:04

prl

11,716
2
13
31

Thank you for sharing the list. Can you explain why a "(p+x)-x evaluates to p" feature would have to be an *extension*? Isn't it possible that the core engine already has this feature? After all, C++ would not disallow it. (In fact, the C++ spec would make it difficult to *avoid* implementing the feature in x86-64. A compiler would have a difficult time determining that `p` is *neither* a memory location *nor* an array element *nor* in an allocated page) – personal_cloud Mar 03 '19 at 11:10
2

If it’s not documented as an extension, then it is UB under the same conditions as in the standard. And if it’s UB, the compiler is likely, in some situations, to generate code that has “unexpected behavior“. – prl Mar 03 '19 at 11:12
"GCC has undefined behavior". Can you please explain what that means? Does the GCC project define the term somewhere? Note that we're in the language-lawyer-troll-area here, so I'm unsure whether this is acceptable without a definition! – Johannes Schaub - litb Mar 03 '19 at 11:32
@Johannes, I mean that if the standard says the behavior is undefined and gcc doesn’t define it itself, then gcc treats the behavior as undefined. – prl Mar 03 '19 at 11:37
I would still like to know why you doubt that the current implementation has the property I expect, but I have to accept your answer as being valid and pertinent based on a reasonable first-level search of the documentation. I appreciate your insights on the topic. Thank you for sharing them. – personal_cloud Mar 03 '19 at 11:51

score 1 · Answer 2 · answered Mar 03 '19 at 11:07

You do not understand what "undefined behavior" is, and I cannot blame you, given that it is often poorly explained. This is how the standard defines undefined behavior, section 3.27 in intro.defs:

behavior for which this document imposes no requirements

That's it. Nothing less, nothing more. The standard can be thought as a series of constraints for compiler vendors to follow when generating valid programs. When there's undefined behavior, all bets are off.

Some people say that undefined behavior can lead to your program spawning dragons or reformatting your hard drive, but I find that to be a bit of a strawman. More realistically, something like going past the ends of the bounds of an array can result in a seg fault (due to triggering a page fault). Sometimes undefined behavior allows compilers to make optimizations that can change the behavior of your program in unexpected ways, since there's nothing saying the compiler can't.

The point is that compilers not "generate undefined behavior". Undefined behavior exists in your program.

What I meant is, if GCC has a great feature (specifically, math on invalid pointers) that's not currently named, we can give it a name, and then demand it in future versions too.

Then it would be a non-standard extension and one would expect it to be documented. I also highly doubt that such a feature would be in high demand given that it would not only allow people to write unsafe code, but it would be extremely hard to generate portable programs for.

Yes, I know what UB means in general. For this question, I clarified that I meant to write *unexpected behavior* which I define as `(p+x)-x` not resulting in `p` (which *ought* to be very clear from the title of the question) and in a very specific implementation of C++, namely gcc linux x86-64. — personal_cloud, Mar 03 '19 at 11:25
`undefined behavior can lead to your program spawning dragons` to be fair, you need to have a dragon spawning adapter installed in the system. — eerorika, Mar 03 '19 at 12:23

score 1 · Accepted Answer · answered Mar 04 '19 at 01:58

Yes, for gcc5.x and later specifically, that specific expression is optimized very early to just p, even with optimization disabled, regardless of any possible runtime UB.

This happens even with a static array and compile-time constant size. gcc -fsanitize=undefined doesn't insert any instrumentation to look for it either. Also no warnings at -Wall -Wextra -Wpedantic

int *add(int *p, long long x) {
    return (p+x) - x;
}

int *visible_UB(void) {
    static int arr[100];
    return (arr+200) - 200;
}

Using gcc -dump-tree-original to dump its internal representation of program logic before any optimization passes shows that this optimization happened even before that in gcc5.x and newer. (And happens even at -O0).

;; Function int* add(int*, long long int) (null)
;; enabled by -tree-original

return <retval> = p;


;; Function int* visible_UB() (null)
;; enabled by -tree-original
{
  static int arr[100];

    static int arr[100];
  return <retval> = (int *) &arr;
}

That's from the Godbolt compiler explorer with gcc8.3 with -O0.

The x86-64 asm output is just:

; g++8.3 -O0 
add(int*, long long):
    mov     QWORD PTR [rsp-8], rdi
    mov     QWORD PTR [rsp-16], rsi    # spill args
    mov     rax, QWORD PTR [rsp-8]     # reload only the pointer
    ret
visible_UB():
    mov     eax, OFFSET FLAT:_ZZ10visible_UBvE3arr
    ret

-O3 output is of course just mov rax, rdi

gcc4.9 and earlier only do this optimization in a later pass, and not at -O0: the tree dump still includes the subtract, and the x86-64 asm is

# g++4.9.4 -O0
add(int*, long long):
    mov     QWORD PTR [rsp-8], rdi
    mov     QWORD PTR [rsp-16], rsi
    mov     rax, QWORD PTR [rsp-16]
    lea     rdx, [0+rax*4]            # RDX = x*4 = x*sizeof(int)
    mov     rax, QWORD PTR [rsp-16]
    sal     rax, 2
    neg     rax                       # RAX = -(x*4)
    add     rdx, rax                  # RDX = x*4 + (-(x*4)) = 0
    mov     rax, QWORD PTR [rsp-8]
    add     rax, rdx                  # p += x + (-x)
    ret

visible_UB():       # but constants still optimize away at -O0
    mov     eax, OFFSET FLAT:_ZZ10visible_UBvE3arr
    ret

This does line up with the -fdump-tree-original output:

return <retval> = p + ((sizetype) ((long unsigned int) x * 4) + -(sizetype) ((long unsigned int) x * 4));

If x*4 overflows, you'll still get the right answer. In practice I can't think of a way to write a function that would lead to the UB causing an observable change in behaviour.

As part of a larger function, a compiler would be allowed to infer some range info, like that p[x] is part of the same object as p[0], so reading memory in between / out that far is allowed and won't segfault. e.g. allowing auto-vectorization of a search loop.

But I doubt that gcc even looks for that, let alone takes advantage of it.

(Note that your question title was specific to gcc targeting x86-64 on Linux, not about whether similar things are safe in gcc, e.g. if done in separate statements. I mean yes probably safe in practice, but won't be optimized away almost immediately after parsing. And definitely not about C++ in general.)

I highly recommend not doing this. Use uintptr_t to hold pointer-like values that aren't actual valid pointers. like you're doing in the updates to your answer on C++ gcc extension for non-zero-based array pointer allocation?.

Thanks for teaching me about `-fdump-tree-original`; that's really cool. Anyways, yes, I intended that the `(p+x)` and subtraction could happen in different places, as in the non-zero-based array pointer question; I've updated the question. And yes, I was looking for what kinds of optimizations the compiler could conceivably do that would break `(p+x)-x==p`. Your prefetching example is conceivable (and, as you say, unlikely). — personal_cloud, Mar 04 '19 at 06:29
@personal_cloud: I didn't mean for prefetching, I meant to auto-vectorize a search loop like `while(*p++ != 0){}` (i.e. strlen) using 16-byte loads and `pcmpeqb` / `pmovmskb` / `test+jz`. So you unavoidably touch data past the terminating 0 byte, unless it happens to be at the end of a vector. To make that safe, you have to align your pointer so you don't cross into a page that doesn't contain any bytes you're allowed to read, and thus might be unmapped. [Is it safe to read past the end of a buffer within the same page on x86 and x64?](//stackoverflow.com/q/37800739). — Peter Cordes, Mar 04 '19 at 06:32
If the compiler "knows" it's safe, it could potentially decide to use unaligned loads from whatever starting point, and thus could segfault if the terminator is the last byte of a page. (But yeah, unlikely, because it has to account for the case where the terminator isn't found and it keeps reading beyond that known-safe distance.) Besides the fact that gcc/clang never auto-vectorize when the loop trip-count isn't known before the first iteration runs. ICC will, though. — Peter Cordes, Mar 04 '19 at 06:35
I see. So conceivably, if I have a bogus pointer `q=(p+100)` and later I do a search loop starting at `q[-100]`, the compiler looks at that and thinks it can load all the way up to `q`, which crosses a page boundary (because `p` was only size 32, say) and segfaults on 1% of runs... Plausible. — personal_cloud, Mar 04 '19 at 06:51
@personal_cloud: yes, exactly. I think ISO C UB rules allow that. — Peter Cordes, Mar 04 '19 at 06:58
I really like this answer because it basically answers the question (as a "yes, very likely, `(p+x)-x==p`") and also clearly illustrates what kinds of UB could result even if the arithmetic itself is OK -- basically showing why I should care about UB side effects, not just the direct result of the UB-causing computation itself. — personal_cloud, Mar 04 '19 at 06:59
At the same time, you'd have to wonder why such an optimization would be written to take advantage of this or that pointer just lying around in the program... Relying on 4K page boundary would seem to work so much more often. Nonetheless, others have made the point that gcc is very complicated and adding tricky optimizations all the time. I first ran into this sort of thing on [this question](https://stackoverflow.com/questions/43152787/) which turned out to be a bad type pun on `bool`s. (only specific types are good for `memcpy`-like operations... and not `bool`...) — personal_cloud, Mar 04 '19 at 07:05
@personal_cloud: yup, this is what everyone has been telling you: UB doesn't just mean you can get the wrong answer for a simple test, it really does mean everything *else* in your whole program can break, too, possibly in subtle ways given certain corner-cases. This one is pretty obscure, though, and quality implementations like gcc might intentionally *not* optimize based on this UB even if they could. It's probably less well-known than signed-overflow UB. That's the classic example, e.g. http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html (about UB) — Peter Cordes, Mar 04 '19 at 07:08
@personal_cloud: yeah, strict-aliasing is the other well-known kind of UB that compilers other than MSVC optimize based on, and have been for many years. Interesting that only gcc5 and newer actually break that buggy code. Also, that proves definitively that casting a pointer to an `__m256i` is not safe. `__m256i*` is like `char*` and can alias anything, but that doesn't work the other direction. That's a useful example, because trivial cases often work anyway. — Peter Cordes, Mar 04 '19 at 07:18
I think `-fsanitize=pointer-overflow` is supposed to instrument this, it looks like a bug that it does not disable the optimization. — Marc Glisse, Mar 04 '19 at 17:21

Does (p+x)-x always result in p for pointer p and integer x in gcc linux x86-64 C++

3 Answers3

Linked