0

After reading the Understanding Strict Aliasing article https://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html I see, how breaking the strict-aliasing rules can cause unexpected results in optimized build. For example:

void test(int* ptr1, float* ptr2);

Since ptr1 and ptr2 are incompatible, compiler assumes that they will never point to the same memory. This allows to optimize the code, which can give unexpected results, if pointers have the same value.

However, in legacy code strict-aliasing rule is mostly broken in simple assignments, like int n = 0; float f = *((float*)&n); Consider the following code:

#include <iostream>

static_assert (sizeof(float) == sizeof(int), "error");

int main(int argc, char *argv[])
{
    float f1, f2;
    int n = static_cast<int>(argv[1][0] - '0'); // Command argument is "0", so n = 0

    memcpy(&f1, &n, sizeof(f1));                // strict-aliasing rule is not broken 
    f2 = *(reinterpret_cast<float*>(&n));       // strict-aliasing rule is broken 

    std::cout << f1 << " " << f2 << std::endl;  // prints 0 0
    return 0;
}

I wonder, how it is even possible for C++ compiler to produce an optimized code, which can give different f1 and f2 values, this means, give unexpected result for the code that breaks strict-aliasing rule.

I investigated Assembly code produced by VC++ 2015 compuler in Debug and Release builds (for simplicity, in 32 bit code). In both cases, f2 assignment is converted to 2 movss instructions, like this:

movss       xmm0,dword ptr [n]  
movss       dword ptr [esp+4],xmm0  

So, I understand if modern C++ compiler will give an error or warning on the offending line. But if compilation is successful, what optimized Assembly code can give an unexpected result?

Notes:

  1. This code intentionally breaks the strict-aliasing rule.

  2. I know that this is UB.

  3. I don't ask what is the strict-aliasing rule, I want to know, how breaking the rule can cause UB in this specific case.

Alex F
  • 42,307
  • 41
  • 144
  • 212
  • 1
    I don't understand what you're asking. it's not that it "can" cause UB, it *does* cause UB. What behaviour you end with is, well, undefined. Are you asking if a specific optimizing compiler in a specific version handles this deterministically, and how? And if so, which compiler? – Quentin Nov 07 '18 at 10:14
  • @Quentin `it does cause UB` Agree, but I want to see this on Assembly level. – Alex F Nov 07 '18 at 10:16
  • 4
    UB cannot be "seen". It's a language-level concept, which specifies that anything can be generated from your code, including a seemingly working binary. – Quentin Nov 07 '18 at 10:18
  • 3
    *how breaking the rule can cause UB* By definition. – n. m. could be an AI Nov 07 '18 at 10:19
  • 3
    [gcc, strict-aliasing, and horror stories](https://stackoverflow.com/a/2959468) shows an example of strict-aliasing violating causing a problem: stores of a different type are moved around a memory copy that uses `long*`. (An unsafe macroed definition of `memcpy` that the Linux kernel used to use, and changed after gcc strict-aliasing broke it). – Peter Cordes Nov 07 '18 at 10:26
  • @PeterCordes Thanks, the last code fragment from your link: `unsigned long a;` and assigning through `unsigned short` pointer is what I am looking for. Can you post this as answer with some details? – Alex F Nov 07 '18 at 10:31
  • 1
    @AlexF if you're looking for discrepancies from UB, [here](https://gcc.godbolt.org/z/f0vMjk)'s a demo. The additional function call in `f2` makes no difference according to the language, but since the code has UB `f1` and `f2` return different results. – Quentin Nov 07 '18 at 10:34
  • 1
    I didn't post an answer because it wouldn't answer the question as written. You're still asking "how breaking the rule can cause UB", and the answer is "by definition" like n.m. said. UB is a C abstract-machine concept. Phrase it differently if you want to ask what kind of asm-level effects can result from strict-aliasing violations with `gcc -O3` targeting x86-64 or other commonly used architectures. – Peter Cordes Nov 07 '18 at 10:56
  • OK, the answer to my question is here: https://stackoverflow.com/questions/2958633/gcc-strict-aliasing-and-horror-stories/2959468#2959468 (thanks to Peter Cordes) and can be summarized as: `unsigned long a; a = 5; *(unsigned short *)&a = 4;` could be re-ordered to set it to 4 first (because clearly they don't alias) and then because now the assignment of 5 was later, the assignment of 4 could be elided entirely. I didn't think about `reordering`, the lesson is learnt. – Alex F Nov 07 '18 at 11:08

1 Answers1

2

Once you have UB, anything can happen.

Compiler is allowed to do anything in your program.

Some compilers "remove" UB branch when UB is detected, so your program might display nothing for example. That's why reasoning on UB is useless.

Jarod42
  • 203,559
  • 14
  • 181
  • 302