-2

When the typecasting violates the strict aliasing rule in C and C++, a compiler may optimize in such a way that wrong constant value can be propagated and unaligned access could be allowed, which results in performance degradation or bus errors.

I wrote a simple example to see how the compiler optimize the constant when I violate the strict aliasing rule in GCC & Clang.

Here is the code and instructions that I got.

#include <stdio.h>
#include <stdlib.h>

int
foo () //different result in C and C++
{
    int x = 1;
    long *fp = (long *)&x;
    *fp = 1234L;

    return x;
}

//int and long are not compatible 
//Wrong constant propagation as a result of strict aliasing violation
long
bar(int *ip, long *lp)
{
    *lp = 20L;
    *ip = 10;

    return *lp;
}

//char is always compatible with others
//constant is not propagated and memory is read
char
car(char *cp, long *lp)
{
    *cp = 'a';
    *lp = 10L;
    return *cp;
}

When I compile the code with the GCC 8.2 with -std=c11 -O3 option.

foo:
  movl $1234, %eax
  ret
bar:
  movq $20, (%rsi)
  movl $20, %eax
  movl $10, (%rdi)
  ret
car:
  movb $97, (%rdi)
  movq $10, (%rsi)
  movzbl (%rdi), %eax
  ret

When I compile the code with the clang 7.0 with -std=c11 -O3 option.

foo: # @foo
  movl $1, %eax
  retq
bar: # @bar
  movq $20, (%rsi)
  movl $10, (%rdi)
  movl $20, %eax
  retq
car: # @car
  movb $97, (%rdi)
  movq $10, (%rsi)
  movb (%rdi), %al
  retq

bar and car function generate almost same instruction sequences and the return values are same in both case; bar violates the rule, and constant is propagated; and car doesn't violate and the correct value is read from the memory.

However, for the foo function which violates the strict aliasing rule generate different output output in GCC and Clang; gcc propagates the correct value stored in the memory (but not with the memory reference), and clang propagates a wrong value. It seems that two compilers both apply the constant propagation as its optimization, but why two compilers generate a different result? Is it mean that GCC automatically finds out strict aliasing violation in the foo function and propagate the correct value?

Why they show different instruction streams and result?

ruach
  • 1,369
  • 11
  • 21
  • 1
    How does this question differs from your previous (now deleted) question? Don't delete and repost questions. If you need to improve your question, *edit it*. – Some programmer dude Jan 17 '19 at 07:31
  • Sorry for the inconvenience, I've found that it was because of the different compiler not because of the language difference between C and C++. Therefore, I did. My apologies – ruach Jan 17 '19 at 07:32
  • 6
    Strict aliasing rule violation is UB. And Undefined Behavior means compiler can do anything. – Anty Jan 17 '19 at 09:36
  • @Anty Then why the GCC generate correct behavior for the foo? Is it because in the foo function, the variable x pointed to by the float * is not going to be changed at runtime?In other words, is it because x is a local variable? And for the bar and car, is it because the argument is passed from other function, so the compiler cannot assure that the two pointers are not going to point the same memory, and should the compiler only rely on the strict aliasing rule? – ruach Jan 17 '19 at 13:42
  • 1
    foo is 100% broken (in UB terms) so compiler can do anything. bar is fine and does not violate aliasing rule so can be constant propagated. car is fine too but char can alias any type so it can't do constant propagation as lp may point to same address as cp. – Anty Jan 17 '19 at 14:15
  • @Anty Why can we say the bar doesn't violate the strict aliasing rule? It cast the int variable to float * and it seems they are not compatible type though – ruach Jan 17 '19 at 14:51
  • Look at your examples - bar uses int and long, there is no float at any function... – Anty Jan 17 '19 at 15:18
  • Oops sorry I mentioned the long as float by mistake. I mean when the same memory location both cast to long pointer and int pointer this results in the violation of strict aliasing. – ruach Jan 17 '19 at 15:22
  • 2
    Perhaps you could remove the C++ part of the question and ask a separate question about that if needed? Because this question will not be easily found with the current tags. I would recommend retagging as: C, gcc, clang, compiler-optimization, strict-aliasing. The C11 and C++17 tags were meant to be acompanied with C and C++ respectively. – Lundin Jan 18 '19 at 07:38
  • 2
    When a teacher gives contradicting instructions (over time) to pupils (happened to me), some will apply the newest orders, some will suppose the older one still holds, some will act randomly. It's unpredictable and you can't punish someone for acting non-deterministic in the case of a contradiction. – curiousguy Jan 18 '19 at 19:40
  • 1
    See https://stackoverflow.com/a/4105123/1505939 – M.M Jan 19 '19 at 23:27

1 Answers1

1

Why can we say the bar doesn't violate the strict aliasing rule?

If the code that calls bar does not violate strict aliasing, bar will not violate strict aliasing either.

Let me give an example.

Suppose we call bar like this:

int x;
long y;
bar(&x, &y);

Strict aliasing requires that two pointers of different types do not refer to the same memory. &x and &y are different types, and they refer to different memory. This does not violate strict aliasing.

On the other hand, let's say we call it like this:

long y;
bar((int *) &y, &y);

Now we've violated strict aliasing. However, the violation is the caller's fault.

Nick ODell
  • 15,465
  • 3
  • 32
  • 66
  • I didn't know that UB means that compiler can do anything including doing nothing; also thanks for pointing out that strict aliasing rule is violated when the caller passes the same memory location type-casted with not compatible types. Lastly thanks for answering the question even though lots of people don't like this question as shown by the negative review. Should I make this question as a possible duplicate of one of the referenced links in the above answers? – ruach Jan 20 '19 at 10:37
  • @JaehyukLee: The authors of the Standard intended that in many implementations would process many actions whose behavior is not defined by the Standard "in a documented fashion characteristic of the environment". Language vandals, however, have somehow made fashionable the notion that any programs relying upon such behavior have always been "broken", despite the fact that the authors of the Standard have explicitly recognized that one of C's strengths is its ability to do things that could not be done using portable programs. – supercat Jan 29 '19 at 21:28