-2

Some sources say the following is a violation of strict aliasing:

#include <stdlib.h>

typedef struct Foo {
    int a;
    double* b;
} Foo;
int main() {

    _Alignas(Foo) unsigned char buffer[2048];
    Foo* a = (Foo*)&buffer[0];
    a->a = 44;
    a->b = NULL;
}

GCC atleast does not throw an error: https://godbolt.org/z/Tbzaodb8W

if this undefined behaviour, how would any allocator be implemented, especially bump allocators that make use of such a unsigned char buffer?

I certainly know that

Foo MyFoo;
...
unsigned char* byteRepresentationOfFoo = (unsigned char*)&MyFoo;

is allowed, since unsigned char* is allowed to alias any type, but what about the reverse with the unsigned char buffer?

Raildex
  • 3,406
  • 1
  • 18
  • 42
  • "(ignore memory alignment for simplicity)" – Raildex Jul 06 '23 at 05:50
  • how so? If I jump around in `buffer` until I hit a byte that fits the alignment of `Foo`, is it legal? – Raildex Jul 06 '23 at 05:52
  • @ikegami added `_Alignas` – Raildex Jul 06 '23 at 05:55
  • gcc, strict-aliasing, and horror stories: https://stackoverflow.com/a/2959468/421195 – paulsm4 Jul 06 '23 at 06:08
  • Thorough explanation of the strict aliasing rule: https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule. Discussion of bump allocators: https://stackoverflow.com/questions/62076569/is-this-implementation-of-malloc-a-bump-allocator – nielsen Jul 06 '23 at 07:23
  • Basically, gcc past version 5.x or so doesn't care about being a quality implementation and clang never cared to be one either. If you want extensive diagnostic messages then use a different compiler or a static analyzer. Or use gcc < version 7, that works too. – Lundin Jul 06 '23 at 08:20
  • 2
    Any allocator that does not use `malloc` and friends inside is in violation of the strict aliasing rules, because the standard singles out `malloc` and friends as functions that can create objects with no declared type. All other objects have declared type and thus are subject to the strict aliasing rules. – n. m. could be an AI Jul 06 '23 at 09:35

3 Answers3

4

Yes it is a strict aliasing violation, undefined behavior. The declared type is unsigned char [2048] so the "effective type" is the same (C17 6.5 §6).

a->a = etc is a lvalue expression accessing the object using another effective type not compatible with unsigned char. (C17 6.5 §7)

As a side note it would have been ok to do stuff like this in case Foo was a struct/union containing a character type array among its members (C17 6.5 §7), but in this case it is not.

GCC atleast does not throw an error

It's not gcc's job to report undefined behavior. The various strict aliasing warnings have always been pretty broken and gcc is also known to be lax when it comes to warning for non-standard extensions in general. clang doesn't throw any diagnostics either.

Related: Why did gcc stop warning about strict aliasing violation from version 7.2? The answer is likely "because bugs". In recent times, they are rolling out so many poorly tested changes to the compiler. It took gcc 28 years to get from version 1 to version 5, but from there on it has been one major version release per year up to 13.x... They rolled out 8 major versions from 2015-2023 while there was only 1 minor revision to the actual C language in that time.


how would any allocator be implemented

It can't, or at least not by taking a raw character buffer and wildly pointer cast from it. malloc and the like are library functions which may be implemented in non-standard C or another language entirely.

Now as it happens most standard libs are actually written in C, glibc etc. But if you compile those with a strict C compiler rather than with specialized options like gcc -fno-strict-aliasing, then there are no guarantees. Follow the build instructions of the library implementors.

You can implement allocators by type punning unions though:

typedef union
{
  Foo foo;
  unsigned char buf[n];
} pun_intended_t;

_Alignas(Foo) pun_intended_t pun = { .buf =  { something } };
Foo* f = (Foo*)pun.buf; // well-defined as long as aligned

This utilizes another exception from "strict aliasing".


unsigned char* is allowed to alias any type, but what about the reverse with the unsigned char buffer?

Rather, any object may be accessed using a character type but not the other way around. This is because of two special rules:

C17 6.5 §7 ("the strict aliasing rule"):

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
/--/

  • a character type.

As well as C17 6.3.2.3 §7

When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 1
    *They rolled out 8 major versions from 2015-2023 while there was only 1 minor revision to the actual C language in that time.* When your only tool is "Write more code!" that's all you do. And when you're a compiler developer for an already-mature compiler for a language that's been stable for 12 years, well, you gotta do something to prove your worth. You seem to have tweaked a GCC fan... :-) – Andrew Henle Jul 06 '23 at 17:56
  • @AndrewHenle If it ain't broken don't fix it... – Lundin Jul 07 '23 at 06:39
  • @AndrewHenle: It's also important to view the Standard as holy writ if it could be interpreted as justifying an incorrect assumption by the gcc optimzier, but as defective in cases where it cannot be thus interpreted. – supercat Jul 13 '23 at 21:53
3

Strict aliasing is well discussed in other questions, e.g.: What is the strict aliasing rule?

Hence, there is no "if this is undefined behavior". It is and the compiler is not required to detect it (warn or raise an error). End of answer.

However, we can look at what some of the potential problems really are.

The most obvious is alignment. A buffer allocated as

   char buffer[sizeof(struct T)];

would probably have no alignment requirement while struct T may very well have alignment requirements. On some systems, this means that if buffer is accessed as a struct T, performance will be impacted. On some systems, it may result in an invalid assembler instruction giving a hard fault on execution.

This can be tricky to catch because code could be working perfectly fine (since buffer by coincidence is aligned correctly) and suddenly after an unrelated code change, the alignment changes and the program crashes. This problem can be avoided by using the _Alignas modifier as in the question.

Another problem is code optimization. Suppose that we have

   _Alignas(struct T) char buffer[sizeof(struct T)] = {0};
   struct T * p = (struct T *)&buffer[0];
   p->a = 1;
   transmit(buffer);

Due to the strict aliasing rule, the compiler is allowed to assume that the statement p->a = 1 does not change buffer. Thus, for the sake of this code snippet, that assignment can be optimized away since *p is otherwise unused.

This issue can be tricky because the code may work perfectly for years until some day a new version of the compiler is used where more aggressive optimization has been implemented and suddenly the code stops working (a common symptom of undefined behavior).

So a bump allocator needs to have an interface just like e.g. malloc where a void * is returned which is guaranteed to fulfil the alignment requirement of any type [which the allocator may be used for]. This pointer is then assigned to a pointer of the specific type and afterwards that memory area is only referred to using that type.

nielsen
  • 5,641
  • 10
  • 27
0

Your code would be conforming, though probably not strictly conforming.

Compilers that are designed and configured to support low-level programming constructs will support constructs like yours without regard for whether the Standard requires them to do so. Compiler configurations that are not designed to be suitable for low-level programming tasks should not be used to perform such tasks. The fact that code such as yours is unsuitable for use with compilers that are unsuitable for low-level programming tasks does not imply any defect in your code, nor the compiler. Instead, it's a consequence of the fact that the Standard deliberately allows implementations specialized for some kinds of tasks to process code in ways that make them unsuitable for many other kinds of tasks.

The "strict aliasing rules" were never intended to partition the universe of programming constructs into those which all implementations must support and those which no programmers must ever use. Instead, the intention was to avoid forbidding compiler writers from performing aliasing optimizations in ways that would be maximally useful to their primary users/customers, recognizing that people wanting to sell compilers would be able to judge their customers' needs than the Committee ever could.

The Standard makes no effort to enumerate all of the constructs that they thought it obvious compilers should support. Aside from the scenario of an lvalue being used to access an object of its own type, all of the constructs that are listed by the Standard as being allowable are things for which--absent such requirement--someone sincerely trying to write a good compiler writer might plausibly decide support wasn't necessary.

supercat
  • 77,689
  • 9
  • 166
  • 211