Is this write to an array truly undefined behavior in C?

Question

This code writes a value through a pointer if one array is one past the end of another array.

#include <stdio.h>
#include <inttypes.h>

extern int first[], second[];

#define ADDR_AFTER(ptr) ((uintptr_t)((ptr) + 1))

int test(int *an_int) {
    *second = 1;
    if (ADDR_AFTER(first) == (uintptr_t)an_int) {
        // ubsan does not like this.
        *an_int = 2;
    }
    return *second;
}

int first[1] = {0}, second[1] = {0};

int main() {
    if (ADDR_AFTER(first) == (uintptr_t)second) {
        printf("test: %d\n", test(second));
        printf("x: %d y: %d\n", *first, *second);
    }
}

At no point do I directly compare two pointers to different objects (because I convert them to uintptr_t first). I create a pointer to one-past-the-end of an array (which is legal), but I never dereference that pointer. As far as I can tell this should either print nothing, or print:

test: 2
x: 0 y: 2

Which is printed on Clang when optimization is -O1 or lower. At -O2, however, it prints:

test: 1
x: 0 y: 2

With -O2 -fsanitize=undefined it prints the this to stdout:

test: 2
x: 0 y: 2

and the following to stderr:

runtime error: store to address 0x000000cd9efc with insufficient space for an object of type 'int'
0x000000cd9efc: note: pointer points here
  00 00 00 00 01 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00

with a reference to the assignment to an_int in test. Is this actually undefined behavior, or is this a bug in Clang?

I suspect this is violating the strict aliasing rule. The optimizer assumes this doesn't happen. — Barmar, Nov 13 '21 at 00:34
The fact that optimizing makes it turn ugly is a good hint. Optimizing should never change the behavior of a well behaved program. — Ted Lyngmo, Nov 13 '21 at 00:37
Strict aliasing doesn't apply here because `p` is a pointer to `int` that hasn't been marked `restrict`. It is therefore explicitly allowed to alias `y`. — Michael Morris, Nov 13 '21 at 00:38
@MichaelMorris Can you please rename your variables into something searchable so that talking about the code becomes easier? `p` ... not a good search item. Give them names ... proper ones! — Ted Lyngmo, Nov 13 '21 at 00:40
Renaming the one variable I mentioned is indeed better but .. do you want to make it easy for people to answer it or do you want to obscure the issues? Do one better please. Give all variables proper names. — Ted Lyngmo, Nov 13 '21 at 00:48
your code is pretty convoluted. And you have different code for optimized and non-optimized which make it that much harder to follow. Right now I don't want to invest time to understand this code, but are you sure the difference you see in the optimized vs non-optimized code isn't cause because you have different code for optimized and non-optimized code via that preprocessor `#if #else`? — bolov, Nov 13 '21 at 00:54
@bolov I've removed the check and can confirm that the issue is not caused by the `#if #else` I've also simplified the code. — Michael Morris, Nov 13 '21 at 00:59
Not quite a dup but closely related: https://stackoverflow.com/questions/45966762/can-an-equality-comparison-of-unrelated-pointers-evaluate-to-true — dbush, Nov 13 '21 at 01:04
Great job on simplifying the code. The code looks ok to me. There's no way `*an_int = 2;` is UB because `an_int` is just `second`. — bolov, Nov 13 '21 at 01:15
fwiw, this shouldn't be `UB`; it is the normal exercising of pointer arithmetic. You are only dealing with +1 pointers to array elements, and +1's have yet to be destroyed by standardization. That said, the clown car is ready for another lap, so this could soon be UB. — mevets, Nov 13 '21 at 01:16
Using the n4713 draft, the relevant part of the standard is section 8.5.10 paragraph (2.1): "If one pointer represents the address of a complete object, and another pointer represents the addressone past the last element of a different complete object,88the result of the comparison is unspecified." but note that "unspecified" is not the same as "undefined". — o11c, Nov 13 '21 at 01:18
This is a known compiler bug. There is an existing question on it; I will look for it. The compiler incorrectly “deduces” that when execution is inside the “then” clause of an `if (a == b)` that compares two pointers, then `a` and `b` must point to elements within the same “array” (using the definition of a “array” for the `==` operator, which treats a single object as an array of one element). — Eric Postpischil, Nov 13 '21 at 01:22
@o11c Does that mean that it would be conformant for the comparison to succeed in `main` but fail in `test`? — Michael Morris, Nov 13 '21 at 01:24
@EricPostpischil Funny, I was converting to `uintptr_t` before the comparison specifically to avoid dealing with exactly that sort of bug. — Michael Morris, Nov 13 '21 at 01:36
@MichaelMorris: “If you lie to the compiler, it will get its revenge.” — Henry Spencer. Not an outright lie, perhaps, just a little attempt at subterfuge. Nonetheless, the compiler is displeased with you. — Eric Postpischil, Nov 13 '21 at 01:37

John Zwinck · Accepted Answer · 2021-11-13T01:35:05.297

There's nothing invalid about your code, the compiler is wrong. If you remove the unnecessary ADDR_AFTER check in test(), the code runs as expected with no UBSan error. If you run it with optimization enabled and without UBSan, you get the wrong output (test=1, should be 2).

Something about the ADDR_AFTER(first) == (uintptr_t)an_int code inside test() makes Clang do the wrong thing when compiling with -O2.

I tested with Apple clang version 11.0.3 (clang-1103.0.32.62) but it looks like Clang 13 and current trunk also have the bug: https://godbolt.org/z/s83ncTsbf - if you change the compiler to any version of GCC you'll see it can return 1 or 2 from main(), while Clang always returns 1 (mov eax, 1).

You should probably file a Clang bug for this.

Is this write to an array truly undefined behavior in C?

1 Answers1