15

I want to write a function that input an array of data and output another array of data using pointers.

I'm wondering what is the result if both src and dst pointed to the same address because I know compiler can optimize for const. Is it undefined behavior? (I tagged both C and C++ because I'm not sure if the answer may differ between them, and I want to know about both.)

void f(const char *src, char *dst) {
    dst[2] = src[0];
    dst[1] = src[1];
    dst[0] = src[2];
}

int main() {
    char s[] = "123";
    f(s,s);
    printf("%s\n", s);
    return 0;
}

In addition to above question, is this well-defined if I delete the const in original code?

Willy
  • 581
  • 2
  • 10

3 Answers3

17

While it is true that the behavior is well-defined - it is not true that compilers can "optimize for const" in the sense that you mean.

That is, a compiler is not allowed assume that just because a parameter is a const T* ptr, the memory pointed to by ptr will not be changed through another pointer. The pointers don't even have to be equal. The const is an obligation, not a guarantee - an obligation by you (= the function) not to make changes through that pointer.

In order to actually have that guarantee, you need to mark the pointer with the restrict keyword. Thus, if you compile these two functions:

int foo(const int* x, int* y) {
    int result = *x;
    (*y)++;
    return result + *x;
}

int bar(const int* x, int* restrict y) {
    int result = *x;
    (*y)++;
    return result + *x;
}

the foo() function must read twice from x, while bar() only needs to read it once:

foo:
        mov     eax, DWORD PTR [rdi]
        add     DWORD PTR [rsi], 1
        add     eax, DWORD PTR [rdi]  # second read
        ret
bar:
        mov     eax, DWORD PTR [rdi]
        add     DWORD PTR [rsi], 1
        add     eax, eax              # no second read
        ret

See this live on GodBolt.

restrict is only a keyword in C (since C99); unfortunately, it has not been introduced into C++ so far (for the poor reason that it is more complicated to introduce in C++). Many compilers do kinda-support it, however, as __restrict.

Bottom line: The compiler must support your "esoteric" use case when compiling f(), and will not have any problem with it.


See this post regarding use cases for restrict.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • `const` is not “an obligation by you (= the function) not to make changes through that pointer”. The C standard permits the function to remove `const` via a cast and then modify the object through the result. Essentially, `const` is just advisory and a convenience to the programmer to help avoid modifying an object inadvertently. – Eric Postpischil Mar 13 '20 at 11:51
  • @EricPostpischil: It's an obligation you can get out of. – einpoklum Mar 13 '20 at 11:53
  • An obligation you can get out of is not an obligation. – Eric Postpischil Mar 13 '20 at 12:01
  • 2
    @EricPostpischil: 1. You're splitting hairs here. 2. That's not true. – einpoklum Mar 13 '20 at 12:04
  • 1
    This is why `memcpy` and `strcpy` are declared with `restrict` arguments, while `memmove` is not -- only the latter allows overlap between the memory blocks. – Barmar Mar 13 '20 at 20:07
  • Additionally, it is incorrect that `const` is an obligation not to make changes through that pointer. The code `int x = 3; foo(&x); … void foo(const int *p) { * (int *) p = 4; }` is permissible and defined by the C standard. – Eric Postpischil Mar 15 '20 at 23:15
  • Please don't repeat yourself; I've already addressed this point. – einpoklum Mar 15 '20 at 23:51
  • @einpoklum: No, the answer states “a compiler is not allowed assume that just because a parameter is a `const T* ptr`, the memory pointed to by `ptr` will not be changed through another pointer.” It does not state that the compiler is not allowed to assume that the memory pointed to by `ptr` will not be changed through `ptr` itself. This is a different point. – Eric Postpischil Mar 16 '20 at 13:39
  • Changes through `ptr` itself do not involve the compiler assuming anything - it just sees those happening. The example you gave is a way to get around the obligation - that's the cast. – einpoklum Mar 16 '20 at 13:42
  • @einpoklum: Suppose that the address of an automatic `int` is taken in only one place, where it is assigned to `int const *restrict p` which has the same scope and lifetime. A compiler would be able to infer that the object won't be modified except via the original lvalue, *even if the pointer was passed to a function it knew nothing about. Such inference would not be possible without the combination of `const` and `restrict`. – supercat Mar 16 '20 at 22:27
  • @supercat: In the case you describe, the compiler would _not_ be able to make that inference, since it would be an invalid one. the "function it knows nothing about" can modify the object despite taking it as `const`. – einpoklum Mar 16 '20 at 22:32
  • @einpoklum: If a pointer is qualified `const *restrict`, no pointer based upon it may access any object whose value would change during its lifetime. From paragraph 4 of N1570 6.7.3.1: "If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: T shall not be const-qualified." So if a restrict pointer's type is const-qualified, modification to an object accessed through a pointer derived from it would imply a violation of that constraint. – supercat Mar 16 '20 at 22:42
  • @supercat: What are L, X and T? – einpoklum Mar 16 '20 at 22:47
  • @einpoklum: `L` is an lvalue derived from a `restrict`-qualfiied pointer. `T` is the type of object the pointer is declared as identifying. `X` is the object designated by the lvalue `L`. – supercat Mar 16 '20 at 22:49
  • @supercat: So, are you saying [this](https://godbolt.org/z/LdZBgZ) is not a valid program? And that gcc is wrong not to emit an error? – einpoklum Mar 16 '20 at 23:00
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/209730/discussion-between-einpoklum-and-supercat). – einpoklum Mar 16 '20 at 23:00
5

This is well-defined (in C++, not sure in C any more), with and without the const qualifier.

The first thing to look for is the strict aliasing rule1. If src and dst points to the same object:

  • in C, they must be of compatible types; char* and char const* are not compatible.
  • in C++, they must be of similar types; char* and char const* are similar.

Regarding the const qualifier, you might argue that since when dst == src your function effectively modifies what src points to, src shouldn't be qualified as const. This is not how const works. Two cases need to be considered:

  1. When an object is defined to be const, as in char const data[42];, modifying it (directly or indirectly) leads to Undefined Behaviour.
  2. When a reference or pointer to a const object is defined, as in char const* pdata = data;, one can modify the underlying object provided it has not been defined as const2 (see 1.). So the following is well-defined:
int main()
{
    int result = 42;
    int const* presult = &result;
    *const_cast<int*>(presult) = 0;
    return *presult; // 0
}

1) What is the strict aliasing rule?
2) Is const_cast safe?

YSC
  • 38,212
  • 9
  • 96
  • 149
  • Maybe the OP means possible reordering of the assignments? – Igor R. Mar 13 '20 at 10:05
  • `char*` and `char const*` are not compatible. `_Generic((char *) 0, const char *: 1, default: 0))` evaluates to zero. – Eric Postpischil Mar 13 '20 at 11:58
  • The phrasing “When a reference or a pointer to a `const` object is defined” is incorrect. You mean that when a reference or pointer to a `const`-qualified **type** is defined, that does not mean the object it is set to point to may not be modified (by various means). (If the pointer does point to a `const` object, that means the object is indeed `const` by definition, so the behavior of trying to modify it is not defined.) – Eric Postpischil Mar 13 '20 at 12:00
  • @Eric, I'm only that specific when the question is about Standard or tagged `language-lawyer`. Exactitude is a value I cherish, but I'm also aware it comes with more complexity. Here, I decided to go for simplicity and easy-to-understand sentences, because I beleive this is wat OP wanted. If you think otherwise please do answer, I'll be amongst the first to upvote it. Anyway, thank you for your comment. – YSC Mar 13 '20 at 13:50
3

This is well-defined in C. Strict aliasing rules do not apply with the char type, nor with two pointers of the same type.

I'm not sure what you mean by "optimize for const". My compiler (GCC 8.3.0 x86-64) generates the exact same code for both cases. If you add the restrict specifier to the pointers, then the code generated is slightly better, but that won't work for your case, the pointers being the same.

(C11 §6.5 7)

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.

In this case (without restrict), you will always get 121 as a result.

S.S. Anne
  • 15,171
  • 8
  • 38
  • 76