4

Consider a C99 program that reads from a read-only binary blob linked into the program's binary through a linkerfile. The program knows where the blob starts in memory, but its layout is not known during compilation. The blob consists of unsigned 32-bit, and 64-bit integers. We took care to make sure that their endianness corresponds to (data) endianness on the used platform. We also took care to put the blob in memory such that it is 4B aligned.

Requirements:

  1. (performance) We want to read both 32-bit and 64-bit integers with minimum number of instructions, based on the possibilities of individual platforms (e.g. to use single load instruction where applicable)

    • we do not want to read the value byte-by-byte and then use shifting and adding to reconstruct the 4B/8B integer.
  2. (portability) This program must run on ARM, x86_64 and MIPS architectures. Also some architectures have 32-bit system bus, others have 64-bit bus.

    • we do not want to have to maintain arch-specific adaptations for each architecture with inlined assembly code.
    • we do not want to make assumptions about used toolchain, e.g. we don't want to use -fno-strict-aliasing and similar.

Seemingly, this could be done with type-punning. We know where in the memory is the value we want to read and we can cast the pointer from original (unsigned char*) to one of uint32_t*, uint64_t*.

But C99's strict aliasing rules confuse me.

There will be no aliasing, of that we can be sure - we would not be punning on the same memory location to two different types that are not unsigned char. The layout of the binary blob does not allow this.

Question:

Is casting a const uint8_t* to const uint32_t*, or const uint64_t* well-defined in C99, as long as we are sure we do not alias the same pointers to both const uint32_t* and const uint64_t*?

  • 2
    If you have a `void *p = pointerToAPositionInACharBuffer;` and do `*(uint32_t*)p`, you violate strict aliasing rules and the behavior is undefined. You should get the `uint32_t` through `memcpy`. It's a builtin on all optimizing compilers, so no need to worry about performance loss compared to `*(uint32_t*)p`: https://gcc.godbolt.org/z/6E9YMG. – Petr Skocik Nov 08 '20 at 00:21
  • For clarification, is the blob always the same endianness for every single platform that you run it on? Also, is the length of the blob known by the program? –  Nov 08 '20 at 00:32
  • @PSkocik Thanks! I will try that. Do you think you could put together a full answer also with an explanation why it violates strict aliasing rules event if there is no aliasing? – Matej Kubicka Nov 08 '20 at 00:34
  • 1
    Why does aliasing blow up? If you write using a pointer of one type and read the same memory using a pointer of a different type, an optimizing compiler may not actually go read the memory if it thinks it knows what's there. If your memory is initialized at link time and you are only reading it, you should not get into trouble even with multiple pointers of different types pointing to the same memory. –  Nov 08 '20 at 00:36
  • @JadenGarcia Size of the blob can be determined from information within the blob. As for endianess, there are both little and big endian versions of the blob available - different blobs are patched in depending on what is getting build. – Matej Kubicka Nov 08 '20 at 00:38
  • @dratenik That is exactly what the question is about. As long as it is only reading, and there is no aliasing, I do not see how any optimization can rely on us never performing type punning. – Matej Kubicka Nov 08 '20 at 00:41
  • @PSkocik how is `*(uint32_t*) p` undefined? The `p` holds the address divisible by 4 (as stated in the question) and thus `*(uint32_t*) p` is totally defined. – mercury0114 Nov 08 '20 at 17:35
  • @PSkocik and also, `memcpy` is a function that's meant for copying possibly large chunks of memory. The implementation of `memcpy` might contain a for loop. Thus, calling such function will be substantially slower than performing a cast. – mercury0114 Nov 08 '20 at 17:37
  • @mercury0114 See my answer. It's undefined because it violates 6.5p7. C isn't a really a portable assembler. It has rules beyond those imposed by hardware. Some don't like some of those rules and compile with -fno-strict-aliasing. I didn't make the rules. – Petr Skocik Nov 08 '20 at 20:24
  • @mercury0114 Effectively, modern compilers make `memcpy` a builtin/operator that only sometimes generates a call to a function of the the same name. – Petr Skocik Nov 08 '20 at 20:25
  • Any toolchain suitable for embedded programming will support a flag equivalent to `-fno-strict-aliasing`. Further, whether by bug or design, neither clang and gcc correctly distinguish between corner cases whose behavior is mandated by the Standard and those whose behavior is not. For example, they sometimes treat an action which overwrites an object of one type with an object of another whose bit pattern is the same as a no-op, without recognizing that the action sets the Effective Type of the object. – supercat Nov 10 '20 at 22:32
  • Related: [c - Strict aliasing rule uint8_t buffer to structure - Stack Overflow](https://stackoverflow.com/questions/54237004/strict-aliasing-rule-uint8-t-buffer-to-structure?noredirect=1&lq=1) – user202729 Aug 17 '21 at 04:49

2 Answers2

2

The strict aliasing rules are effectively (pun intended (the 2nd pun intended too)) 6.5p6 and 6.5p7.

If you read through a declared char buffer, e.g.:

char buf[4096];
//...
read(fd, buf, sizeof(buf);
//...

and want do *(uint32_t*)(buf+position) then you're definitely violating

6.5p7

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object,

If you mmap or malloc the buffer (make the memory dynamically typed), then it's more complicated, but in any case, the standard-compliant way way of reading such a uint32_t--through memcpy--works in either case and typically carries no performance penalty because optimizing compilers recognize memcpy calls and treat them specially.

Example:

#include <stdint.h>
#include <string.h>

uint32_t get32_noalias(void const *P) 
{
     return *(uint32_t*)(P);
}


static inline uint32_t get32_inl(void const *P) 
{ 
    uint32_t const*p32 = P; 
    //^optional (might not affect codegen)
    //to assert that P is well-aligned for uint32_t
    uint32_t x; memcpy(&x,p32,sizeof(x)); 
    return x; 
}

//should generate same code as get32_noalias
//but without violating 6.5p7 when P points to a char[] buffer
uint32_t get32(void const *P) 
{ 
    return get32_inl(P);
}

https://gcc.godbolt.org/z/sGf4rf

Generated assembly on x86-64:

get32_noalias:                          # @get32_noalias
        movl    (%rdi), %eax
        retq

get32:                                  # @get32
        movl    (%rdi), %eax
        retq

While*(uint32_t*)p probably won't blow up in your case in practice (if you only do readonly accesses or readonly accesses intertwined with char-based writes like those done by the read syscall, then it "practically" shouldn't blow up), I don't see a reason to avoid the fully-standard compliant memcpy-based solution.

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • 1
    Try with `MIPS64 gcc 5.4 (el)`, the assembly code for `get32` becomes considerably more complicated. – mercury0114 Nov 08 '20 at 17:58
  • @mercury0114 You have a point - under MIPS the `memcpy` is not inlined, the generated code that we can see there is effectively a function call (e.g. push new frame to stack and "jump with link" instruction: `jal memcpy`). I think that this is used because MIPS does not support direct unaligned word access (e.g. `lw` instruction) and it cannot be sure the source pointer to `memcpy` is aligned. – Matej Kubicka Nov 08 '20 at 21:36
  • @MatejKubicka `__builtin_assume_aligned(P,_Alignof(*p32))` fixes that: https://gcc.godbolt.org/z/M533Eo. It shouldn't be required because the `uint32_t const*p32 = P;` conversion (`void const*` to `uint32_t const*`) is undefined iff P isn't suitably aligned but it is what it is. Personally I have generic macros for no-strict-aliasing loads and store, and I use both memcpy and __builtin_assume_aligned in the implementation (the assumed alignment is that of the target type). Again, the `__builtin_assume_aligned` shouldn't be needed there but gcc sometimes requires if you want good codegen. – Petr Skocik Nov 08 '20 at 21:54
  • I've verified this on my end. Same problem with MIPS. The `__builtin_*` can be useful in practice, but as the original question mentions we prefer not to assume we use any specific toolchain.-- Nonetheless, this can't be helped easily, so accepting this answer. @PSkocik- it might make sense to add your comment to the answer, for future reference. – Matej Kubicka Nov 09 '20 at 08:38
  • @MatejKubicka: Toolchain support for a mode equivalent to `-fno-strict-aliasing` is far more universal than support for constructs like `__builtin_assume_aligned`. Any "optimizer" that would make one jump through hoops to accomplish a certain task that would be easy in its absence is, for purposes of that task, not an optimizer. – supercat Nov 10 '20 at 22:38
1

Is casting a const uint8_t* to const uint32_t*, or const uint64_t* well-defined in C99, as long as we are sure we do not alias the same pointers to both const uint32_t* and const uint64_t*?

In general, no, as the the the alignment needs of const uint32_t*, or const uint64_t* may exceed const uint8_t*.

In OP's case, it is likely OK. The description of code is not as good a true code to make certain.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256