0

Given this concrete example

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
 
 
int main(void)
{
  uint8_t stuff[8] = {0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88};
  int64_t bigint = *((int64_t *) stuff);
 
  printf("bigint = %ld \n", bigint);
 
  return 0;
}

Am I violating the C11 standard? Note that it appears to work right compiled with later gcc and clang with "-Wall -Wextra -fstrict-aliasing -std=c11", but for example gcc 6.4 will issue warnings that "dereferencing type-punned pointer will break strict-aliasing rules" which I expect from my understanding.

Off the top of my head the safe way to do this is to perform the always safe cast to a character type and memcopy. However I want to understand whether this is UB or simply implementation defined; safe to do but not portable.

  • 1
    Yes, it is UB. The `char` type exception to the strict aliasing rule only works in the other direction: when accessing SOME data using a character pointer. However, you cannot access character data using a non-character type such as `int64_t`. What you're trying to do can be done with either `memcpy` or with a `union`. – Petr Skocik Jan 22 '21 at 08:40
  • @DavidC.Rankin You can't go from uint8_t to int64_t though. Strict aliasing only allows the other way around as a special exception. – Lundin Jan 22 '21 at 08:41
  • Also in theory the uint8_t array could be misaligned and then it would be UB to lvalue access it as an aligned type for that reason. – Lundin Jan 22 '21 at 08:42
  • @Lundin at least alignment can be fixed with alignas(sizeof(uint64_t)) on the array, supposing it is. Still UB? – anonymouscoward Jan 22 '21 at 08:45
  • 1
    @DavidC.Rankin PSkocik is correct, the standard (6.5/7) says "An object shall have its stored value accessed only by an lvalue expression that has one of the following types:" ... "a character type." So if we have a character type, its stored value cannot be accesses by a lvalue expression of `int64_t` because `int64_t` is not a character type, nor does it match any other exception to the rule. – Lundin Jan 22 '21 at 08:45
  • I'll defer to the wiser minds here on the "type compatible" issue -- but I think we are back in the murky part of the strict aliasing rule. – David C. Rankin Jan 22 '21 at 08:47
  • @DavidC.Rankin Library implementations do all manner of wild things. Those are special cases as they are part of the internals of the implementation (compiler). They need to be compiled with special settings like `-fno-strict-aliasing`. – Lundin Jan 22 '21 at 08:47
  • I -- happily -- defer in that case. – David C. Rankin Jan 22 '21 at 08:48
  • 1
    @DavidC.Rankin Btw the strict aliasing rules are based on the rules of _effective type_, which in turn is defined based on the use of functions like `memcpy`. Now suppose you are to write the library code for `memcpy` according to the rules of effective type. So in order to do so, you must first implement `memcpy`... that's a catch 22 - it's simply impossible to implement `memcpy` according to the rules in 6.5. – Lundin Jan 22 '21 at 08:51
  • @DavidC.Rankin Libcs sometimes do trigger "benign" UB if it's known to be benign (or contained, as in not causing trouble across translation unit boundaries) on the given platform. It's safer to use `__attribute__((may_alias))`, or `memcpy(...,__builtin_assume_aligned(Ptr,4), 4)`, though (`__bultin_assume_aligned` isn't technically needed but sometimes helps gcc/clang generate code as good as what the corresponding cast would generate). I think the standard is clear hear and this particular issue has been discussed on SO multiple times, pretty much always with this conclusion. . – Petr Skocik Jan 22 '21 at 08:52
  • Yes, I understand the concept, but after picking through the `strlen()` code, there was a bit of confusion. Specifically it does something similar to `const char *s` and then to iterate it does `unsigned x = *(unsigned*)s;` and then proceeds to check the bytes of `x[0]` - `x[3]` looking for a zero byte. Now I have run this specifically without disabling `-fno-strict-aliasing` and `-std=c11` is fine with it. (`-Wall -Wextra -pedantic`) So that's why I say we are in a murky area of the rule. – David C. Rankin Jan 22 '21 at 08:56
  • 2
    Yeah the effective type/strict aliasing rules are shakily written and have been criticized by many experts. And it was probably never the intention by the C99 committee that implementations should abuse UB when lvalue accessing between for example different integer types. Some compilers chose to do this (ISO compliant safety hazard compilers) and it created a lot of problems particularly in embedded systems, when open source compilers became increasingly popular because of the ARM hype some 10-15 years back. – Lundin Jan 22 '21 at 08:59
  • @Lundin - here is the example I was thinking of [unsigned from char*](https://godbolt.org/z/5nM6bd). Like I said a bit ago, I will *happily* defer on this issue `:)` – David C. Rankin Jan 22 '21 at 09:07
  • @Lundin, sure that `memcpy` is not implementable as user code in C. That's why it is in the standard library in the first place: it must be integrated into the C implementation and the compiler has to be able to make the assertions that are needed for it. – Jens Gustedt Jan 22 '21 at 14:07
  • @JensGustedt I'm not convinced that user code could be implemented according to 6.5/6 either... there's too many holes in the spec. For example [this](https://stackoverflow.com/questions/65356861/what-rules-are-there-for-qualifiers-of-effective-type) has me quite confused and none else seem to know the answer either. – Lundin Jan 22 '21 at 14:18
  • @JensGustedt: How does the Standard characterize actions which the vast majority of implementations were expected to process identically, but which some implementations might more usefully be able to process in a manner that could not necessarily be specified in a manner consistent with sequential program execution? – supercat Aug 11 '21 at 22:24
  • @JensGustedt: The requirement that all actions which might cause the effects of optimization to be observable *must* be categorized as UB makes it impossible to write UB-free programs that are as efficient as would be possible absent such a requirement, e.g. in cases where a program copies an entire structure even though the value of some members will never be observed. While it may *for some purposes* be useful to have a diagnostic implementation squawk if a struct is copied without all members having been written first, it would *for many other purposes* be more useful... – supercat Aug 11 '21 at 22:32
  • ...to say that a programmer need not store values to struct members whose contents could never affect program behavior. Consider also the question of how to write code for a function which might fail to terminate for some invalid inputs, and whose results may or may not be used. If it would be acceptable for a compiler to skip calls in cases where the result would be ignored, or for it to hang on invalid input even in such cases, how should a programmer write code to allow for that? Adding a dummy side effect in the loop would block what should be a useful optimization. – supercat Aug 11 '21 at 22:42

0 Answers0