4

I found a particular usage pattern that seems completely ok and no compiler has ever complained on before. Now it raises a warning with gcc-11: A close to minimal example is below. Save as t.c and compile using gcc-11 -O2 -Wall -c t.c.

#include <stdlib.h>
#include <string.h>

extern void f(const char *s);

void
t(int len, char *d)
{ char tmp[500];
  char *s, *o;
  int i;

  if ( len <= sizeof(tmp) )
    s = tmp;
  else if ( !(s=malloc(len)) )
    return;

  for(o=s,i=0; i < len; i++)
    *o++ = d[i]+1;

  f(s);
//  i = strlen(s);
  if ( s != tmp )
    free(s);
}

Compiling results in:

gcc-11 -O2 -Wall -c t.c
t.c: In function ‘t’:
t.c:20:3: warning: ‘s’ may be used uninitialized [-Wmaybe-uninitialized]
   20 |   f(s);
      |   ^~~~
t.c:4:13: note: by argument 1 of type ‘const char *’ to ‘f’ declared here
    4 | extern void f(const char *s);
      |             ^
t.c:20:3: warning: ‘tmp’ may be used uninitialized [-Wmaybe-uninitialized]
   20 |   f(s);
      |   ^~~~
t.c:4:13: note: by argument 1 of type ‘const char *’ to ‘f’ declared here
    4 | extern void f(const char *s);
      |             ^
t.c:8:8: note: ‘tmp’ declared here
    8 | { char tmp[500];
      |        ^~~

Now there are some observations

  • calling strlen(s) instead of f(s) does not result in a warning. Note that both accept const char* (checked in /usr/include/string.h)
  • If I remove the const in the declaration of f(), the problem vanishes too.
  • Calling using f((const char*)s) doesn't help either.
  • Initializing s as in char *s = NULL does not help either (and I do not want that anyway as it hides correct warnings about not initializing a variable).

I have learned that claiming something is a GCC bug is typically rightfully proven to be wrong. So I first check here what I'm missing.

edit As I cannot put code in the comment, here is code that disproves part of the claim. This compiles fine:

extern void f(const char *__s);

void
t(int len, char *d)
{ char tmp[500];
  char *s=tmp, *o;
  int i;

  for(o=s,i=0; i < len; i++)
    *o++ = d[i]+1;

  f(tmp);
}
Jan Wielemaker
  • 1,670
  • 10
  • 18
  • 2
    Try to add `if (len < 1) return;` in the start of the function – Support Ukraine Nov 01 '21 at 20:26
  • Your second code example doesn't change that the compiler was correct for the first code part. – Support Ukraine Nov 01 '21 at 20:32
  • There's many problems with this code unrelated to the question - it is needlessly obscure and complicated. Essentially you just need a malloc call, a memcpy call, a custom function call and a free call. You should also give the variables and functions meaningful names. I'd recommend posting the working code on https://codereview.stackexchange.com/ – Lundin Nov 02 '21 at 10:41
  • @Lundin, simplified code ([mcve]s) are not suitable for [codereview.se] - who knows how well the real code is written? – Toby Speight Nov 02 '21 at 10:43
  • @TobySpeight Yes they should post the real code, but this minimal example has very likely inherited a lot of code smell from it. Either way, this code is desperately in need of a complete make-over. – Lundin Nov 02 '21 at 10:47

2 Answers2

4

The compiler is correct, although the English phrasing is imperfect.

Assuming len is always positive, a fix is to insert if (len <= 0) __builtin_unreachable(); in the function. This tells the compiler that len is always positive, which means some data must be written to the memory s points to before f is called.

When the compiler says “‘s’ may be used uninitialized,” it does not mean that the value of s may be used but rather that what it points to may be used, and that pointed-to-memory is not initialized. Note that s is passed to a function that takes a const char *s, suggesting that function will not modify the data s points to and therefore expects it to contain data already. The C standard does not strictly require this; as long as the pointed-to-memory was not defined with const, f is allowed to reconvert the pointer to its original char * and modify the data there, but the implication of the parameter declaration is that it will not.

We can confirm this by changing the body of the function to:

char tmp[500];
f(tmp);

Then the compiler complains “warning: 'tmp' may be used uninitialized.” It is patently obvious that tmp, as passed to the function, is not uninitialized; it will be the address of the array. So the compiler must be warning that it is the contents of the array that may be used uninitialized.

Note that, while the loop starting with for(o=s,i=0; i < len; i++) superficially appears to initialize the data that s points to, it does not if len is zero. And since s is passed to f without len, f has no way of knowing nothing is in s (except by some side channel such as using external objects). So presumably f reads at least some data in s in every call.

Here is a smaller example:

#include <stdlib.h>

extern void f(const char *s);

void t(int len)
{
    char *s = malloc(len);
    f(s);
    free(s);
}

Presumably, len is always positive. To tell GCC this, insert this line in the function:

if (len <= 0) __builtin_unreachable();

That results in no new code generation, but the warning vanishes. (Actually, the generated code gets smaller, partly because the compiler can enter the for loop without first testing i < len.)

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Isn't the data `s` points to initialized here: `for(o=s,i=0; i < len; i++) *o++ = d[i]+1;` ? – Support Ukraine Nov 01 '21 at 20:18
  • 1
    @4386427: Not if `len` is zero, and `f` has no way of knowing that. It is not passed `len`, so, if it is going to do anything with `s`, it would have to inspect the contents. There is no other way for it to learn that `s` contains nothing, except we could postulate `len` or other information has previously been stored in some external object that `f` accesses. But the compiler warning does say “may” be used uninitialized, so it is allowing for abnormal things like that. – Eric Postpischil Nov 01 '21 at 20:19
  • 1
    Good point. For all `len < 1` the code has a problem. I added `if (len < 1) return;` in the start of the function and the warnings was gone... – Support Ukraine Nov 01 '21 at 20:21
  • This is not the whole story. I've added another piece of code that, according to the above, should raise a warning, but it does not (see __edit__). That makes my original code fine (the 0 len is fine as the actual code passes the len along to `fI()`). The next issue is how to tell the compiler to keep silent (hopefully without #pragma)? – Jan Wielemaker Nov 01 '21 at 20:33
  • The other really weird issue is that there is no problem calling strlen(). Ok, that is built-in, so I also tried open(). That too works fine without warnings. I can't find anything out of the ordinary in the system headers that may explain the difference either? – Jan Wielemaker Nov 01 '21 at 20:35
  • 1
    @JanWielemaker: My answer says the compiler correctly detects the problem in the first code. It does not assert the detects the problem whenever it appears. Your second example would disprove that second assertion, but it does not disprove the first. Further, if we include `` and insert `assert(0 < len);` in the first code, the warning vanishes, thus proving the compiler’s concern is in the case when `len` may be zero (or negative). – Eric Postpischil Nov 01 '21 at 20:36
  • If you want the compiler to be quiet you would need to do what it is asking (ensure all code paths use initialized variables) or don't use Wall. – bauman.space Nov 01 '21 at 21:16
  • The assert() and `__builtin_unreachable` solutions work for this simplified example, but not for the real code :( Best I can see right now is to initialize the first byte of both the tmp buffer _and_ the `s` to 0. I now understand the reasoning of the compiler. I still don't like it. `const` tells the compiler the called function doesn't change any values through the pointer so it can avoid reloading the data. There is no guarantee how much of the pointed to data is initialized. In this case, that can be nothing, but the function the array is passed to knows that. – Jan Wielemaker Nov 02 '21 at 08:04
  • Got another location in the same code base where putting the 0-bytes didn't help either. Only branching on the length immediately and use the stack or heap, repeating the code helped. Surely @EricPostpischil got the reasoning right. The exact conditions are still a mystery to me. – Jan Wielemaker Nov 02 '21 at 08:14
  • @JanWielemaker: Show us the code for which the `__builtin_unreachable` method does not eliminate the warning. – Eric Postpischil Nov 02 '21 at 11:55
  • @JanWielemaker: Re “In this case, that can be nothing, but the function the array is passed to knows that”: How can the function know no data at the location it was passed is initialized? Through some other parameter? Through an external object? – Eric Postpischil Nov 02 '21 at 11:57
  • @EricPostpischil Code is at https://github.com/SWI-Prolog/packages-ssl/blob/f38871d674f96f53d1f1960d34c660389b11a9a5/cryptolib.c#L65. Delete the commented initializations. This also shows how the receiver knows how much to take as that is passed along. – Jan Wielemaker Nov 02 '21 at 16:26
  • @JanWielemaker: In that code, you can avoid the warning by inserting `if (end <= data) __builtin_unreachable();` before the `for` loop. Note that should be done only if `end <= data` is always false. Otherwise, it gives the compiler a false premise, and that can result in errant behavior. You can, of course, turn off the warning with `-Wno-maybe-uninitialized`. – Eric Postpischil Nov 02 '21 at 23:35
0

It was explained in the gcc-11 release notes

https://www.gnu.org/software/gcc/gcc-11/changes.html

maybe-uninitialized is now default included

https://gcc.gnu.org/onlinedocs/gcc-11.1.0/gcc/Warning-Options.html#index-Wmaybe-uninitialized

compiler emits a warning if it cannot prove the uninitialized paths are not executed at run time.

Even if it is clear to the author, the onus is on the compiler to prove data is initialized, otherwise it sets a warning.

-Wmaybe-uninitialized is triggered with -Wall

The real question becomes- why not just initialize the data?

This is example code but it looks like a vulnerability waiting for an exploit given that tmp can be any arbitrary non-null memory, presumably len is there because there's no guarantee of a null terminator (or embedded nulls may be present), but len isn't passed through to f.


#include <stdlib.h>
#include <string.h>

extern void f(const char *s);

void
t(int len, char *d)
{ char tmp[500] = {0};  //ensure nulls
  char *s = tmp;
  char *o = s;
  int i;

  if ( len <= sizeof(tmp) )
    s = tmp;
  else if ( !(s=malloc(len)) )
    return;
  memset(s+len, 0, 1); // ensure a trailing null
  for(i=0; i < len-1; i++) // leave that null there at least
    *o++ = d[i]+1;

  f(s);
//  i = strlen(s);
  if ( s != tmp )
    free(s);
}

gcc compiles it with no warnings

gcc -O2 -Wall -c t.c
bauman.space
  • 1,993
  • 13
  • 15
  • `memset(s+len, 0, 1);` if `len >= 500` looks like UB. – Ted Lyngmo Nov 01 '21 at 20:45
  • Solid point. Probably should have a better handling around the stack to heap migration. Whether it's on the stack or heap, initialize with a trailing null solves this problem and other input validation issues – bauman.space Nov 01 '21 at 21:06