0

In c, this pattern is fairly common:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int init_ptr_or_return_err(int *p) {
  srand(time(NULL));
  // random to make code compile/demonstrate the question
  int error = rand() % 2;
  if (error) {
    return 1;
  }
  *p = 10;
  return 0;
}

int main() {
  int a;
  init_ptr_or_return_err(&a);
  printf("a = %d\n", a);
  return 0;
}

In the main function above, without checking the return code of the function, accessing the value of a might be undefined at runtime (but this is not statically determinable). So, it is usually wrapped in a block such as:

if (init_ptr_or_return_err(&a)) {
  // error handling
} else {
  // access a
}

In this case, the compiler knows that a is initialized in the else because the function returns 0 if and only if it sets a. So, technically, accessing a in the else is defined, but accessing a in the if is undefined. However, return 0 could easily be "return some fixed, but statically unknown value from a file" (and then check just that value before accessing a). So in either case, it isn't statically determinable whether a is initialized or not.

Therefore, it seems to me like in general, the compiler cannot statically decide if this is undefined behavior or not and therefore should not be able to e.g. optimize it out.

What are the exact semantics of such code (is it undefined behavior, something else, or is there a difference between static and runtime undefined behavior) and where does the standard specify this? If this is not defined by the standard, I am using gcc, so answers in the context of gcc would be helpful.

user1413793
  • 9,057
  • 7
  • 30
  • 42
  • 1
    Your question is not entirely clear because you say "Here's some code, now here is chunk you could substitute, now here are some other ideas, is it UB?" Can you just post a single, complete program instead, and ask about that? – John Zwinck Jul 22 '18 at 01:52
  • the function: `init_ptr_or_return_err()` has two return points, with returned values of 0 or 1. However, in `main()`, the call to `init_ptr_or_return_err()` fails to check the returned value – user3629249 Jul 22 '18 at 18:39
  • in this function: `init_ptr_or_return_err()` the passed parameter is only set under certain conditions, It should always be set to some initial or updated value, – user3629249 Jul 22 '18 at 18:41

2 Answers2

3

The vast majority of undefined behavior is not statically determinate. Most undefined behavior is of the form "if this statement is reached, and these conditions are met, the program has undefined behavior".

That's the case here. When the program is invoked at a time such that rand() returns an odd number, it has undefined behavior. When it's invoked at a time such that rand() returns an even number, the behavior is well-defined.

Further, the compiler is free to assume you will only invoke the program at a time when rand() returns an even number. For example it might optimize out the branch to the return 1; case, and thereby always print 10.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 2
    While text in the question mentions the possibility that `a` is accessed in the first substatement of the `if`, it is not explicitly shown or clearly stated. Answering the question in this form is confusing. Also, as the address of `a` is taken, it is not subject to the statement in C 2011 6.3.2.1 2 that makes accessing some uninitialized things undefined behavior. Whether there is undefined behavior here (if `a` is accessed in the first substatement of the `if`) is implementation-dependent, further muddying the question. – Eric Postpischil Jul 22 '18 at 02:40
  • Since the function has external linkage, the compiler cannot know that the return value will not be consulted at every call site. So I don't see how it could eliminate the `return 1` from the compiled function. What it could do is inline a simplified version at this particular call site. – rici Jul 22 '18 at 06:56
  • @rici: If the compiler makes an inlined version and an externally visible version, the linker can remove the externally visible function when making the final executable, resulting in a C implementation that effectively eliminates the `return 1`. (The macOS linker has features for this sort of dead code elimination, although it is not doing so by default in this case.) – Eric Postpischil Jul 22 '18 at 10:35
  • @eric sure, but only from that executable. The check is still in the object file. The point is that the optimisation does not affect correct uses of the function (which do check the return value) in other translation units. The bug is a bug, and is not exacerbated by the optimisation, which is legitimate *at the call site*. – rici Jul 22 '18 at 13:08
1

This code does not necessarily invoke undefined behavior. It is a wide-spread myth that reading an uninitialized variable always invokes undefined behavior. UB only occurs in two special cases:

  • Reading uninitalized variables that didn't have their address taken, or
  • Running on an exotic systems with trap representations for plain integers, where an indeterminate value could be a trap.

On mainstream 2's complement platforms (x86/x64, ARM, PowerPC, almost anything...), this code will merely use an unspecified value and invoke unspecified behavior. This is because the variable had its address taken. This is explained in detail here.

Meaning that the result isn't reliable, but the code will execute as expected, without optimizations going bananas etc.

Indeed the compiler will most likely not optimize out the function, but that is because of the time() call and not because of some poorly-defined behavior.

Lundin
  • 195,001
  • 40
  • 254
  • 396