8

I have been reading about the strict aliasing rule for a while, and I'm starting to get really confused. First of all, I have read these questions and some answers:

According to them (as far as I understand), accessing a char buffer using a pointer to another type violates the strict aliasing rule. However, the glibc implementation of strlen() has such code (with comments and the 64-bit implementation removed):

size_t strlen(const char *str)
{
    const char *char_ptr;
    const unsigned long int *longword_ptr;
    unsigned long int longword, magic_bits, himagic, lomagic;

    for (char_ptr = str; ((unsigned long int) char_ptr 
             & (sizeof (longword) - 1)) != 0; ++char_ptr)
       if (*char_ptr == '\0')
           return char_ptr - str;

    longword_ptr = (unsigned long int *) char_ptr;

    himagic = 0x80808080L;
    lomagic = 0x01010101L;

    for (;;)
    { 
        longword = *longword_ptr++;

        if (((longword - lomagic) & himagic) != 0)
        {
            const char *cp = (const char *) (longword_ptr - 1);

            if (cp[0] == 0)
                return cp - str;
            if (cp[1] == 0)
                return cp - str + 1;
            if (cp[2] == 0)
                return cp - str + 2;
            if (cp[3] == 0)
                return cp - str + 3;
        }
    }
}

The longword_ptr = (unsigned long int *) char_ptr; line obviously aliases an unsigned long int to char. I fail to understand what makes this possible. I see that the code takes care of alignment problems, so no issues there, but I think this is not related with the strict aliasing rule.

The accepted answer for the third linked question says:

However, there is a very common compiler extension allowing you to cast properly aligned pointers from char to other types and access them, however this is non-standard.

Only thing comes to my mind is the -fno-strict-aliasing option, is this the case? I could not find it documented anywhere what glibc implementors depend on, and the comments somehow imply that this cast is done without any concerns like it is obvious that there will be no problems. That makes me think that it is indeed obvious and I am missing something silly, but my search failed me.

Yağmur Oymak
  • 467
  • 4
  • 13
  • Have you seen this? https://stackoverflow.com/questions/37302691/what-are-the-strict-aliasing-rules-when-casting-from-a-char-array – John Zwinck Jun 08 '17 at 22:59
  • 2
    The code could simply be badly written. The alignment check that converts a pointer `(unsigned long int) char_ptr` is also fishy. And they go through all this trouble to attempt some weird optimization which adds extra branches and doesn't necessary look faster, possibly slower. – Lundin Jun 09 '17 at 10:52
  • 1
    So why would GNU people (AFAIK also FreeBSD people apply a similar optimization) go through all this if it did not make it any faster? – Yağmur Oymak Jun 09 '17 at 11:03
  • @YağmurOymak I don't know, probably because they like to pose by writing weird algorithms with magic numbers? Their pre-mature optimization doesn't make sense on many systems. For example this function will be incredibly slow on 8 bit computers. Whoever wrote this code most likely assumed that every computer in the world is a Linux PC. – Lundin Jun 09 '17 at 11:23
  • The code assumes `sizeof(unsigned long int) == sizeof(void *)`, and `typedef unsigned long int uintptr_t;` in `` also appears. Here's [the full implementation with nothing stripped for those interested](https://sourceware.org/git/?p=glibc.git;f=string/strlen.c;hb=HEAD). Similar code casting a pointer to `unsigned long int` can be found in [the implementation of `memcpy`](https://sourceware.org/git/?p=glibc.git;f=string/memcpy.c;hb=HEAD). On my machine using GCC (`-std=gnu11|c11`), the resulting behavior is the same, `-fstrict-aliasing` or not. No diagnostics appear during compilation –  Jun 10 '17 at 03:20
  • @Lundin Who uses glibc on 8 bit computers? – curiousguy Jun 11 '17 at 01:59
  • 1
    @curiousguy A whole lot of people, I would imagine. GCC has become very popular in all kinds of embedded systems programming during the last 5-10 years. And it is raining bugs on such systems because of it, most of them actually caused by gcc's aggressive optimization based on strict aliasing. – Lundin Jun 12 '17 at 06:42
  • 1
    A relevant glibc mailing list post: https://sourceware.org/ml/libc-alpha/2016-02/msg00052.html – Tavian Barnes Jun 19 '17 at 15:59
  • @lundin _doesn't necessary look faster, possibly slower_ Yes, what on earth are they playing at there? – Paul Sanders Jul 09 '18 at 20:07

4 Answers4

10

In ISO C this code would violate the strict aliasing rule. (And also violate the rule that you cannot define a function with the same name as a standard library function). However this code is not subject to the rules of ISO C. The standard library doesn't even have to be implemented in a C-like language. The standard only specifies that the implementation implements the behaviour of the standard functions.

In this case, we could say that the implementation is in a C-like GNU dialect, and if the code is compiled with the writer's intended compiler and settings then it would implement the standard library function successfully.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • 3
    From what I can tell, the glibc people intended the library to be compiled with gcc (-std=gnu11 or similar). Concerns about strict aliasing isn't mentioned anywhere. https://www.gnu.org/software/libc/manual/html_node/Configuring-and-compiling.html. Probably the code is just badly written and contains a bug. – Lundin Jun 09 '17 at 11:29
  • @Lundin What bug? – curiousguy Jun 11 '17 at 02:06
  • @curiousguy The one mentioned in the question. `longword_ptr = (unsigned long int *) char_ptr;` – Lundin Jun 12 '17 at 06:43
  • @Lundin Then prove it is a bug. – curiousguy Jun 12 '17 at 19:14
  • @curiousguy This post already assumes that you know about the strict aliasing rule. You can read all about it in C11 6.5 §6 and §7. Summary here: https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule – Lundin Jun 13 '17 at 06:29
  • @Lundin You still have not proven there is a bug. – curiousguy Jun 15 '17 at 04:39
  • @curiousguy It is obvious to anyone who knows the strict aliasing rule. I'm not going to educate you about it in comments. The actual bug is here: `*longword_ptr`. The data is accessed through a pointer of a different, non-compatible type than the effective type of the data pointed at. – Lundin Jun 15 '17 at 06:38
  • @Lundin There is no bug. – curiousguy Jun 15 '17 at 22:28
  • 2
    @Lundin Sorry, he does this (and it just wastes everybody's time). – Paul Sanders Jul 09 '18 at 20:08
3

When writing the aliasing rules, the authors of the Standard only considered forms that would be useful, and should thus be mandated, on all implementations. C implementations are targeted toward a variety of purposes, and the authors of the Standard make no attempt to specify what a compiler must do to be suitable for any particular purpose (e.g. low-level programming) or, for that matter, any purpose whatsoever.

Code like the above which relies upon low-level constructs should not be expected to run on compilers that make no claim of being suitable for low-level programming. On the flip side, any compiler which can't support such code should be viewed as unsuitable for low-level programming. Note that compilers can employ type-based aliasing assumptions and still be suitable for low-level programming if they make a reasonable effort to recognize common aliasing patterns. Some compiler writers are very highly invested in a view of code which fits neither common low-level coding patterns, nor the C Standard, but anyone writing low-level code should simply recognize that those compilers' optimizers are unsuitable for use with low-level code.

supercat
  • 77,689
  • 9
  • 166
  • 211
0

The wording of the standard is actually a bit more weird than the actual compiler implementations: The C standard talks about declared object types, but the compilers only ever see pointers to these objects. As such, when a compiler sees a cast from a char* to an unsigned long*, it has to assume that the char* is actually aliasing an object with a declared type of unsigned long, making the cast correct.

A word of caution: I assume that strlen() is compiled into a library that is later only linked to the rest of the application. As such, the optimizer does not see the use of the function when compiling it, forcing it to assume that the cast to unsigned long* is indeed legit. If you called strlen() with

short myString[] = {0x666f, 0x6f00, 0};
size_t length = strlen((char*)myString);    //implementation now invokes undefined behavior!

the cast within strlen() is undefined behavior, and your compiler would be allowed to strip pretty much the entire body of strlen() if it saw your use while compiling strlen() itself. The only thing that allows strlen() to behave as expected in this call is the fact, that strlen() is compiled separately as a library, hiding the undefined behavior from the optimizer, so the optimizer has to assume the cast to be legit when compiling strlen().

So, assuming that the optimizer cannot call "undefined behavior", the reason why casts from char* to anything else are dangerous, is not aliasing, but alignment. On some hardware, weird stuff starts happening if you try to access a misaligned pointer. The hardware might load data from the wrong address, raise an interrupt, or just process the requested memory load extremely slowly. That is why the C standard generally declares such casts undefined behavior.

Nevertheless, you see that the code in question actually handles the alignment issue explicitly (the first loop that contains the (unsigned long int) char_ptr & (sizeof (longword) - 1) subcondition). After that, the char* is properly aligned to be reinterpreted as unsigned long*.

Of course, all of this is not really compliant with the C standard, but it is compliant with the C implementation of the compiler that this code is meant to be compiled with. If the gcc people modified their compiler to act up on this bit of code, the glibc people would just complain about it loud enough so that the gcc will be changed back to handle this kind of cast correctly.

At the end of the day, standard C library implementations simply must violate strict aliasing rules to work properly and be efficient. strlen() just needs to violate those rules to be efficient, the malloc()/free() function pair must be able to take a memory region that had a declared type of Foo, and turn it into a memory region of declared type Bar. And there is no malloc() call inside the malloc() implementation that would give the object a declared type in the first place. The abstraction of the C language simply breaks down at this level.

cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
  • I hate reading it, but I don't see a violation given [**6.5 (7) (C11 draft n1570)**](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf), last bullet point, `a character type`. The only *'access'* is though a character type. (even if that char type is cast to something else) – David C. Rankin Jun 11 '17 at 04:59
  • 1
    @DavidC.Rankin Strict aliasing rules are all about reordering accesses via pointers of different types. If you have an `int*`, the exception that you cite allows you to safely cast it to a `char*` and manipulate its bytes via this pointer. It does not allow you to subsequently cast that `char*` to a `float*`, and then access the integer data as float data: **The compiler would be allowed to read the data behind the `float*` before storing any data via the `int*`**. `strlen()`doesn't know whether the data was stored as `char`, the caller might have legally cast an `int*` to a `char*`. – cmaster - reinstate monica Jun 11 '17 at 08:58
  • @DavidC.Rankin It means that you can safely cast from `type*` to `char*`, but not the other way around, as done in this case. – Lundin Jun 12 '17 at 06:46
  • Would the downvoters please leave a comment on their reason? – cmaster - reinstate monica Jun 12 '17 at 07:13
  • @cmaster Yes, it is frustrating when they do that. @Lundin thank you. I just want to make sure I'm clear. So you are saying if I have `int a[] = {1,2,3}, *b = a;` I can validly say `char *c = (char *)b;`, but I cannot then say `int *d = (int *)c;` That's fine because the effective type is always `int *`, isn't that right? So the violation in the original question would be due to the cast back not being to the original effective type? – David C. Rankin Jun 12 '17 at 08:13
  • 1
    @DavidC.Rankin The effective type of one item in the array is `int`. So you can apply any number of pointer type casts to that address and then cast it to `int*` in the end at it would be fine. Strict aliasing only refers to the type used when the stored value is accessed. So `*(int*)(bananas_t*)(void*)(char*)a = 0;` is fine, it does not violate strict aliasing. Neither does `*(char*)a = 0;`. What does violate strict aliasing is going the other way around, from char to int: `char a[] = {1,2,3,4}; printf("%d", *(int*)a);`. Note that the strict aliasing rule does not address misalignment. – Lundin Jun 12 '17 at 08:20
  • Thanks again. I do get all of that. The part I don't get, is I can compile the `strlen` function with `-Wall -Wextra -pedantic` and `gcc` issues no peep about a strict alias violation (it is generally really good at letting me know) and only complains of the `‘magic_bits’ [-Wunused-variable]`. There are several versions of this code, all based loosely on the `#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)` check for a zero byte in 8-bytes. I have not received a warning for any of them. Thus I will just resign to be confuzzled about the wording of 6.5 (7). `:)` – David C. Rankin Jun 12 '17 at 09:07
  • 1
    @DavidC.Rankin Well, undefined behavior is there for the optimizer to exploit, not for the compiler to complain - and it's the optimizer that quite often bites you when your program does not work as expected due to undefined behavior (= a vital code portion simply got optimized away as the optimizer was able to prove that it would only be executed after your program invoked undefined behavior). I never expect as much as a warning when it comes to undefined behavior. – cmaster - reinstate monica Jun 12 '17 at 19:04
  • 1
    I didn't vote but your rationale isn't really solid; the strict aliasing rule is not about pointers, and the assumptions you are making about how compilers work won't always hold up when there are aggressive optimization passes and then the TBAA pas. – M.M Jun 12 '17 at 21:03
  • @M.M I have now added a bit to be more clear about the role of undefined behavior in this case. I hope, this avoids further confusion. Thanks for the feedback, anyway :-) – cmaster - reinstate monica Jun 13 '17 at 07:45
  • @cmaster: The authors of the Standard recognize that it would allow a conforming implementation to be of such poor quality as to be useless, and makes no attempt to forbid every silly thing an implementation might do. It would reasonable for a compiler--even one intended for low-level programming--to assume that if a T1* is accessed twice, with no intervening "mention" of a T1, that it can consolidate the two accesses. That would be reasonable (albeit forbidden by the Standard) even if there were intervening accesses via `char*`. On the other hand, it has never been reasonable... – supercat Jul 31 '18 at 21:54
  • ...for a compiler that claims to be suitable for low-level programming to make the same assumption about two accesses via `T1*` which are separated by a cast from `T1*` to `T2*` and a subsequent use of that `T2*`. Unfortunately, some compiler writers would rather claim that low-level code is "broken" and insist the Standard allows compilers to behave in low-quality fashion, than try to make compilers process low-level code efficiently. Some compilers like `icc` work sensibly, but aren't available for many embedded platforms. – supercat Jul 31 '18 at 21:57
-4

The underlying assumption is probably that the function is separately compiled, and not available for inlining or other cross function optimizations. This means that no compile time information flows inside or outside the function.

The function doesn't try to modify anything through a pointer, so there is no conflict.

curiousguy
  • 8,038
  • 2
  • 40
  • 58