1

Is the compiler allowed to store a larger value in an 8-bit uninitialized variable?

I have an example code to illustrate the issue.

#include <stdio.h>
#include <inttypes.h>

int unhex(char *data, char *result, unsigned length)
{
  uint8_t csum; /* in a working code it should be =0 */
  for (unsigned i = 0; i < length; i++) {
    if (!sscanf(data + 2*i, "%2hhx", result + i)) {
      printf("This branch is never taken\n");
      return 597;
    }
    csum += result[i];
  }
  return csum;
}

char a[] = "48454c4c4f";
char b[20];
int main() {
  printf("%d\n", unhex(a,b,5));
  printf("%s\n", b);
}

When running the above code under -Og optimization, it reads:

597
HELLO

When it should clearly output something that is 8-bit, i.e. in [0, 256).

My GCC version is gcc (Gentoo 8.3.0-r1 p1.1) 8.3.0 and I run gcc -o stuff -Og -ggdb -Wall -Wextra -Wuninitialized -Wmaybe-uninitialized stuff.c to compile it. And it does not warn about the possibly unininitialized variable. I know that this is UB, but is it really allowed even to such extent?

I understand that what goes on here is that GCC optimizes csum out entirely, and treats return csum just as if it was never there in the first place.

So there are two questions, actually:

  • Is it that the compiler cannot inform the programmer of such an action (clearly no explicit assignment to csum throughout the code), or is it just a bug/caveat of GCC?
  • Does the C standard permit such UB concerning undefined variables? (here de facto storing a value out of type range)

EDIT: the funny thing is that if += is replaced by =, the compiler complains, just like it should, near the return line:

stuff.c: In function ‘unhex’:
stuff.c:14:10: warning: ‘csum’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   return csum;
          ^~~~
Arusekk
  • 827
  • 4
  • 22
  • "I know that this is UB, but is it really allowed even to such extent?' --> Yes. C does not restrict what UB is. Why expect _undefined behavior_ to be defined? – chux - Reinstate Monica Dec 01 '19 at 09:57
  • Well, I would expect nothing more from _undefined_ behavior, but there are still some limits, like, that the function must preserve callee-saved registers or stuff. – Arusekk Dec 01 '19 at 10:02
  • With UB there are no limits. Still a good presentation of the question UV. – chux - Reinstate Monica Dec 01 '19 at 10:03
  • 2
    Curiously 597 (decimal) is 255 (hex), hmmm. – chux - Reinstate Monica Dec 01 '19 at 10:09
  • The Standard deliberately allows implementers to process code that uses uninitialized automatic objects in whatever way their authors feel would best serve their customers, if they care about such things; as a consequence, it also allows compilers to process such code in any fashion the their authors see fit for any other reason whatsoever. In many cases, customers would be best served by having compilers simply treat such objects as holding whatever value their assigned storage happens to hold at the start of their lifetime, but sometimes trapping may be more useful even if it adds cost. – supercat Dec 03 '19 at 21:46

0 Answers0