45

In the code below why is b[9] uninitialized instead of out-of-bounds?

#include <stdio.h>

int main(void)
{
    char b[] = {'N', 'i', 'c', 'e', ' ', 'y', 'o', 'u', '!'};
    printf("b[9] = %d\n", b[9]);

    return 0;
}

Compiler call:

% gcc -O2 -W -Wall -pedantic -c foo.c
foo.c: In function ‘main’:
foo.c:6:5: warning: ‘b[9]’ is used uninitialized in this function [-Wuninitialized]
     printf("b[9] = %d\n", b[9]);
% gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.6) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Update: Now this is odd:

#include <stdio.h>

void foo(char *);

int main(void)
{
    char b[] = {'N', 'i', 'c', 'e', ' ', 'y', 'o', 'u', '!'};
    foo(&b[9]);
    foo(&b[10]);
    printf("b[9] = %d\n", b[9]);
    printf("b[10] = %d\n", b[10]);

    return 0;
}

Compiling this results in the warnings one would expect:

% gcc -O2 -W -Wall -pedantic -c foo.c
foo.c: In function ‘main’:
foo.c:9:5: warning: array subscript is above array bounds [-Warray-bounds]
     foo(&b[10]);
     ^
foo.c:10:29: warning: array subscript is above array bounds [-Warray-bounds]
     printf("b[9] = %d\n", b[9]);
                             ^
foo.c:11:29: warning: array subscript is above array bounds [-Warray-bounds]
     printf("b[10] = %d\n", b[10]);

Suddenly gcc sees the out-of-bounds for what it is.

Goswin von Brederlow
  • 11,875
  • 2
  • 24
  • 42
  • Interestingly clang [gets](https://wandbox.org/permlink/jFGdJNaoPEMcdoRy) it right. – Gaurav Sehgal Jul 17 '18 at 12:45
  • 6
    Try `printf("b[10] = %d\n", b[10]);` 9 is one past the end of the array, and is an allowable address (although it's still undefined to actually dereference it...). – Andrew Henle Jul 17 '18 at 12:50
  • @AndrewHenle But b[9] is a dereference, &b[9] would be valid. Some more oddness added to the question. – Goswin von Brederlow Jul 17 '18 at 13:02
  • 2
    One past the end of the array may be treated differently - and in your first case, not quite correctly. See the paragraphs on pointer arithmetic in the C standard: [https://port70.net/~nsz/c/c11/n1570.html#6.5.6p8](https://port70.net/~nsz/c/c11/n1570.html#6.5.6p8) – Andrew Henle Jul 17 '18 at 13:08
  • 5
    The different warnings are probably from different gcc versions. The behaviours of *both* your samples are undefined by the standards, so compilers are not actually *required* to do anything in particular with them - warnings are not required. The problem for a compiler-developer is that undefined behaviour can manifest in an unlimited number of ways. It is therefore difficult for a compiler to quickly (in the sense of programmers not whinging that it takes too long to compile) work out which warning is "best". – Peter Jul 17 '18 at 13:34
  • @GoswinvonBrederlow: I think I fixed a typo; do please rollback if I'm mistaken. – Bathsheba Jul 17 '18 at 13:35
  • Same gcc in both cases. Antti Haapala is probably right about the array being optimized away when not otherwise used making the out-of-bounds error disappear. – Goswin von Brederlow Jul 17 '18 at 13:41

4 Answers4

57

I believe this could be the case here: in the first code, GCC notices that you don't need the entire char array at all, just b[9], so it can replace the code with

char b_9; // = ???
printf("b[9] = %d\n", b_9);

Now, this is a completely legal transform, because as the array was accessed out of bounds, the behaviour is completely undefined. Only in latter phase does it then notice that this variable, which is a substitute for b[9], is uninitialized, and issues the diagnostics message.

Why I believe this? Because if I add just any code that will reference the array's address in memory, for example printf("%p\n", &b[8]); anywhere, the array now is fully realized in memory, and compiler will diagnose array subscript is above array bounds.


What I find even more interesting is that GCC does not diagnose out-of-bounds access at all unless optimizations are enabled. This would again suggest that whenever you're writing a program new program you should compile it with optimizations enabled to make the bugs highly visible instead of keeping them hidden with debug mode ;)

  • I agree. gcc -O0 code is slow, unreadable, skips a lot of warnings and has little relevance to anything you want to ship. You should always use -Os or -O2. – Goswin von Brederlow Aug 06 '18 at 16:14
16

The behaviour on reading b[9] or b[10] is undefined.

Your compiler is issuing a warning (it doesn't have to), although the warning text is a little misleading, but not technically incorrect. In my opinion, it's rather clever. (A C compiler is not required to issue a diagnostic for out of bounds access.)

Regarding &b[9], the compiler is not allowed to dereference that, and must evaluate it as b + 9. You are allowed to set a pointer one past the end of an array. The behaviour of setting a pointer to &b[10] is undefined.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • 2
    Obviously. The question is specifically about the warning gcc shows. Not about the behavior or validity of the code. Gcc does now about out-of-bounds errors and the question is why it doesn't use it here. See the update for more strangeness. – Goswin von Brederlow Jul 17 '18 at 13:05
  • 2
    @GoswinvonBrederlow: The gcc warning is correct. b[9] is uninitialised. – Bathsheba Jul 17 '18 at 13:06
  • It's really not uninitialised. It's out-of-bounds. dereferencing it is plain wrong. – Goswin von Brederlow Jul 17 '18 at 13:08
  • 1
    @GoswinvonBrederlow: We can agree to disagree on that point. Your question is now more interesting now you've added &b[9] &c. – Bathsheba Jul 17 '18 at 13:09
  • 1
    True `b[9]` is uninitialized. It is also outside the `b[]` array. [I tried](https://stackoverflow.com/a/51384188/2410359) to "initialize" `b[9]` and still have only 9 elements. Result supports "warning text is a little misleading, but not technically incorrect." – chux - Reinstate Monica Jul 17 '18 at 14:42
  • https://stackoverflow.com/questions/3144904/may-i-take-the-address-of-the-one-past-the-end-element-of-an-array – user202729 Jul 18 '18 at 02:25
1

Some additional experimental results.


Using char b[9] instead of char b[] appears to make no difference, gcc still warns the same with char b[9].

Interestingly, initializing the one-passed element via the "next" member in a struct 1) does quiet the "uninitialized" warning and 2) does not warn about accessioning outside the array.

#include <stdio.h>

typedef struct {
  char c[9];
  char d[9];
} TwoNines;

int main(void) {
  char b[9] = { 'N', 'i', 'c', 'e', ' ', 'y', 'o', 'u', '!' };
  printf("b[] size %zu\n", sizeof b);
  printf("b[9] = %d\n", b[9]);   // 'b[9]' is used uninitialized in this function [-Wuninitialized]

  TwoNines e = { { 'N', 'i', 'c', 'e', ' ', 'y', 'o', 'u', '!' }, //
                 { 'N', 'i', 'c', 'e', ' ', 'y', 'o', 'u', '!' } };

  printf("e size %zu\n", sizeof e);
  printf("e.c[9] = %d\n", e.c[9]);   // No warning.

  return 0;
}

Output

b[] size 9
b[9] = 0
e size 18    // With 18, we know `e` is packed.
e.c[9] = 78  // 'N'

Notes:
gcc -std=c11 -O3 -g3 -pedantic -Wall -Wextra -Wconversion -c -fmessage-length=0 -v -MMD -MP ...
gcc/gcc-7.3.0-2.i686

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Re accessing the "next" member: That is legal because: 1. A pointer to the first member is equivalent to a pointer to the whole struct, 2. Any object can be safely cast to char array and dereferenced to inspect the byte representation, 3. `e.c` coincidentally happens to be a char array, so 4. You are type punning the whole struct into an array of char. – Kevin Jul 18 '18 at 04:46
-2

When you compile the code with -O2 the triviality of the example makes this variable optimized out. So the warning is 100% correct

0___________
  • 60,014
  • 4
  • 34
  • 74