1

I have the following ostensibly simple C program:

#include <stdint.h>
#include <stdio.h>

uint16_t
foo(uint16_t *arr)
{
  unsigned int i;
  uint16_t sum = 0;

  for (i = 0; i < 4; i++) {
    sum += *arr;
    arr++;
  }

  return sum;
}

int main()
{
  uint32_t arr[] = {5, 6, 7, 8};
  printf("sum: %x\n", foo((uint16_t*)arr));

  return 0;
}

The idea being that we iterate over an array and add up it's 16-bit words ignoring overflow. When compiling this code on x86-64 with gcc and no optimization I get what would seem to be the correct result of 0xb (11) because it's summing the first 4 16-bit words which include 5, and 6:

$ gcc -O0 -o castit castit.c
$ ./castit
sum: b
$ ./castit
sum: b
$

With optimization on it's another story:

$ gcc -O2 -o castit castit.c
$ ./castit
sum: 5577
$ ./castit
sum: c576
$ ./castit
sum: 1de6

The program generates indeterminate values for the sum.

I'm assuming the position that it's not a compiler bug for now, which would lead me to believe that there is some undefined behavior in the program, however I can't point to a specific thing which would lead to it.

Note that when the function foo is compiled to a separately linked module the issue is not seen.

bockmabe
  • 350
  • 2
  • 7
  • 1
    It's interesting to see what sort of compiler optimization did here. I cannot reproduce on my side. Can you put the compiled castit program somewhere for analysis? – cimarron Jan 21 '16 at 01:16
  • What does the assembly code look like when you compile with optimization enabled? – user3386109 Jan 21 '16 at 01:17
  • 1
    The correct format specifier for `uint16_t` is `"%" PRIx16` , it would be good to either use this, or cast the argument to `unsigned int` to match the specifier `%x` – M.M Jan 21 '16 at 01:30
  • [Reproduced in gcc 4.5 through 4.9](http://goo.gl/oCXwwx) . Since 5.0 it gets smart enough to do the computation at compile-time again. I thought strict aliasing optimizations were only performed at -O3 but this example shows they occur at -O2 as well – M.M Jan 21 '16 at 01:34
  • With respect to the question of what the asm looks like, I won't paste it here to prevent clutter, but it has the common signature of undefined behavior in which a segment of code from the -O0 output (which initializes the array on the stack) is just plain missing (optimized out) for the -O2 case. – bockmabe Jan 21 '16 at 09:32

2 Answers2

3

You are breaking the strict aliasing rule, which is indeed UB. That's because you alias your array arr of uint32_t via a pointer of different type, i.e. uint16_t when passing it to foo(uint16_t*).

The only pointer type you can use to alias other types is a char*.

Some additional reading material on the subject: http://dbp-consulting.com/tutorials/StrictAliasing.html

Community
  • 1
  • 1
vsoftco
  • 55,410
  • 12
  • 139
  • 252
  • Thanks for this answer. Indeed -fno-strict-aliasing stops the issue from reproducing (not to say that's always the best solution). Also one thing I found is that for this case gcc isn't smart enough to warn about this even with -Wall on or with -Wstrict-aliasing on explicitly. – bockmabe Jan 21 '16 at 09:28
0

The C language as defined by K & R was not designed to facilitate easy code generation, but not necessarily the generation of efficient code. ANSI et al. added some rules with the intention of allowing compilers to generate more efficient code by assuming programmers wouldn't do certain things, but did so in a way which makes it impossible to implement certain kinds of algorithms cleanly and efficiently. A standards-conforming version of foo which can accept pointers to either uint16_t or uint32_t would look like:

uint16_t foo(uint16_t *arr)  // Could perhaps use void*
{
  unsigned int i;
  uint16_t *src = arr;
  uint16_t temp;
  uint16_t sum = 0;

  for (i = 0; i < 4; i++) {
    memcpy(&temp, src, sizeof temp);
    sum += temp;
    src++;
  }

  return sum;
}

Some compilers will recognize that memcpy can be replaced with code that simply reads *src directly as a uint16_t, so the above code might run efficiently despite its use of memcpy. On some other compilers, however, the above code would run quite slowly. Further, while memcpy made it possible in C89 to do everything which would be possible in the absence of the pointer-usage rules, C99 added additional rules which make some kinds of semantics essentially unachievable. Fortunately, in your particular case, memcpy will allow the algorithm to be expressed in a way that will allow some compilers will yield efficient code.

Note that many compilers include an option to compile a language which extends standard C by eliminating the aforementioned restrictions on pointer usage. On gcc, the option -fno-strict-alias may be used for that purpose. Use of that option may severely degrade the efficiency of some kinds of code unless the programmer takes steps to prevent such degradation (using restrict qualifiers when possible, and copying things that might be aliased to local variables), but in many cases code can be written so -fno-strict-alias won't hurt performance too badly.

supercat
  • 77,689
  • 9
  • 166
  • 211