0

Basically, is this code legal when strict aliasing is enabled?

void f(int *pi) {
    void **pv = (void **) π
    *pv = NULL;
}

Here, we access an object of one type (int*) through a pointer of another type (pointer to void *), so I would say that it is indeed a strict-aliasing violation.

But a sample attempting to highlight the undefined behavior makes me doubt (even if it does not prove that it is legal).

First, if we alias int * and char *, we can get different values depending on the optimization level (so it is definitely a strict-aliasing violation):

#include <stdio.h>

static int v = 100;

void f(int **a, char **b) {
    *a = &v;
    *b = NULL;
    if (*a)
        // *b == *a (NULL)
        printf("Should never be printed: %i\n", **a);
}

int main() {
    int data = 5;
    int *a = &data;
    f(&a, (char **) &a);
    return 0;
}
$ gcc a.c && ./a.out
$ gcc -O2 -fno-strict-aliasing a.c && ./a.out
$ gcc -O2 a.c && ./a.out
Should never be printed: 100

But the very same sample with void ** instead of char ** does not exhibit the undefined behavior:

#include <stdio.h>

static int v = 100;

void f(int **a, void **b) {
    *a = &v;
    *b = NULL;
    if (*a)
        // *b == *a (NULL)
        printf("Should never be printed: %i\n", **a);
}

int main() {
    int data = 5;
    int *a = &data;
    f(&a, (void **) &a);
    return 0;
}
$ gcc a.c && ./a.out
$ gcc -O2 -fno-strict-aliasing a.c && ./a.out
$ gcc -O2 a.c && ./a.out

Is it just accidental? Or is there an explicit exception in the standard for void **?

Or maybe just the compilers handle void ** specifically because in practice (void **) &a is too common in the wild?

Lundin
  • 195,001
  • 40
  • 254
  • 396
rom1v
  • 2,752
  • 3
  • 21
  • 47
  • There is nothing which can break the strict aliasing rules. You assign the the **pointer** not reference the data through the incompatible reference. NULL is good for any pointer type. BTW you silenced the warning when calling the function by applying the cast – 0___________ Nov 09 '21 at 10:15
  • 1
    But if we use `char **` instead of `void **`, we exhibit the undefined behavior. Here, the data is `int *` (not `int`) and we access it through a pointer to `void *`, so we access it though an incompatible (?) pointer. – rom1v Nov 09 '21 at 10:21
  • 2
    @0___________ These are strict aliasing violations of the pointer variable itself, not at the pointed-at data. – Lundin Nov 09 '21 at 10:29

3 Answers3

4

Basically, is this code legal when strict aliasing is enabled?

No. The effective type of pi is int* but you lvalue access the pointer variable through a void*. De-referencing a pointer to give an access which doesn't correspond to the effective type of the object is a strict aliasing violation - with some exceptions, this isn't one.

In your second example, both parameters to the function are set to point at an object of effective type int* which is done here: f(&a, (char **) &a);. Therefore *b inside the function is indeed a strict aliasing violation, since you are using a char* type for the access.

In your third example you do the same but with a void*. This is also a strict aliasing violation. There is nothing special with void* or void** in this context.

Why your compilers exhibits a certain form of undefined behavior in some situations is not very meaningful to speculate about. Although void* must by definition be convertible to/from any other object pointer type, so they very likely have the representation internally, even though that's not an explicit requirement from the standard.

Also you are using -fno-strict-aliasing which turns off various pointer aliasing-based optimizations. If you wish to provoke strange and unexpected results, you shouldn't use that option.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • I thought that casting a pointer to (char *) was not a violation of the strict aliasing rule; Is it different for char ** ? Is it really a strict aliasing rule issue ? Or more, the order of the insructions that is not guaranteed ? – Guillaume Petitjean Nov 09 '21 at 10:49
  • 1
    @GuillaumePetitjean There is no cast to `char*` anywhere in this code. And if it was, the exception from the strict aliasing rule related to `char*` only applies if you de-reference them and access the lvalue as a character type, in order to get the raw binary representation. Just because there's an exception to character type lvalues it makes no sense to assume that there will be an exception for dereferencing some unrelated `char**` type. – Lundin Nov 09 '21 at 10:57
0

Yes, void * and char * are special.

Is void** an exception to strict aliasing rules?

You are not aliasing through the void ** type; you are aliasing through void *. In *pv = NULL, the type of *pv is void *.

Generally, the C standard allows different types of pointers to have different representations. They can even have different sizes. However, it requires some pointer types to have the same representations. C 2018 6.2.5 28 says [separated into bullet points by me for clarity]:

  • A pointer to void shall have the same representation and alignment requirements as a pointer to a character type.49)
  • Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements.
  • All pointers to structure types shall have the same representation and alignment requirements as each other.
  • All pointers to union types shall have the same representation and alignment requirements as each other.
  • Pointers to other types need not have the same representation or alignment requirements.

Footnote 49 says:

The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.

A note is not part of the normative part of the standard. That is, it does not form a rule that implementations must conform to. However, the note appears to be telling us that, regardless of the formal rules, you should be able to use a void * in place of a char * in certain places and vice-versa. Stating that two things should be interchangeable looks like a rule. My interpretation is that the authors of this text intended void * and char * to be interchangeable, at least to some extent, but did not have formal wording suitable for putting into the normative part of the C standard. There are in fact defects in the C standard’s treatment of aliasing, such as this one, so the C standard really needs a rewriting of the rules.

So, although this is not a normative part of the standard, compiler developers may give it deference and support aliasing a char * with a void * and vice-versa. This could explain why you see aliasing with char * behaving as if it is supported while aliasing with int * does not.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Binary compatibility is a different matter than type compatibility. 6.2.7 compatible types sends us to 6.7.6.1 which is clear: "For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types." `char` is not compatible with `void`. – Lundin Nov 09 '21 at 14:42
  • @Lundin: Yes, binary compatibility is different from type compatibility, but the note is telling us there is more than just binary compatibility. The normative text says there is binary compatibility, and the note goes further and tells us there is interchangeability: In some situations, `void *` may be substituted for `char *` and vice-versa. The note seeks to give some interchangeablility that the normative text fails to express. – Eric Postpischil Nov 09 '21 at 20:05
0

While char* and void* are required to have matching representations, some platforms use a different representations for int*. Thus, any code that would rely upon the ability to use a dereferenced void** to access all pointer types interchangeably would not be portable to such machines, and is from the Standard's point of view "non-portable". Thus, the Standard waives jurisdiction over whether any particular implementation should support such constructs. Implementations that do so will be more suitable for low-level programming than those that don't, and thus quality implementations which are designed and configured to be suitable for that purpose will do so. Note, however, that neither clang nor gcc is designed to be particularly suitable for low-level programming except when using the -fno-strict-aliasing flag.

To clarify why platforms might use different representations for int* and char*, some hardware platforms don't allow direct addressing of memory in chunks smaller than 16 bits. The Standard would allow a compiler for such a platform to store things in a variety of ways, with different trade-offs between performance, storage efficiency, and compatibility with code that expects char to be 8 bits:

  1. Simply make char match the size of the smallest directly storage unit (e.g. making both char and int be 16 bits). I've used a compiler that did that. This approach would likely offer best performance, but code that uses large arrays of unsigned char to hold octets would waste half of the storage thereof.

  2. Store 8 bits of useful data in each char, leaving the other 8 unused. Store 16-bit values split between two words, and 32-bit values split among four. This would offer excellent compatibility, but lousy performance and storage efficiency.

  3. Implement char* as a combination of a pointer to a 16-bit word, a bit which indicates which half of the word it should identify, and 15 padding bits, but implement int* as simply a pointer to a 16-bit word.

  4. Implement char* as above, but add a padding byte to int*. This would improve compatibility, but waste some storage.

No single approach would be best for all applications, but the Standard would allow implementations to select whichever approach or approaches (perhaps selectable via command-line switches) would be most useful for their customers.

supercat
  • 77,689
  • 9
  • 166
  • 211