2

Is it safe to assume that the if statement will always work if the variable is uninitialized? Assumption is yes, but I have been told that random bits of garbage in the variable does not always mean that the check if null will work.

Void afunction () {
    char* someStr;
    if (someStr) 
    {
        // do something
    }
}
Baptiste Wicht
  • 7,472
  • 7
  • 45
  • 110
  • I'm pretty sure that's not a guarantee. I think it's implementation dependent on where a created pointer is pointing. – Falmarri Feb 05 '13 at 22:31
  • @Falmarri Reading uninitialized variables is **undefined behavior**, not “implementation dependent”. See http://markshroyer.com/2012/06/c-both-true-and-false/ and http://kqueue.org/blog/2012/06/25/more-randomness-or-less/ – Pascal Cuoq Feb 05 '13 at 22:33
  • @PascalCuoq: Right, that's regular variables. I'm not sure if uninitialized pointers are the same or not. – Falmarri Feb 06 '13 at 16:30

4 Answers4

9

Is it safe to assume that the if statement will always work if the variable is uninitialized?

No. Reading uninitialized storage invokes undefined behavior. You can't make safe assumptions about this code.

Don't do this!

datenwolf
  • 159,371
  • 13
  • 185
  • 298
  • As a side note, in every C/C++ implementation I have seen, this will almost surely initialize `someStr` with an old value (of some other variable) from the stack. Compilers warn you about this for good reason. – John Colanduoni Feb 05 '13 at 22:33
  • 1
    @HevyLight If the uninitialized variable gets a consistent old value that was lying on the stack, you should consider yourself lucky. An optimizing compiler will not balk at treating the uninitialized read as unreachable (since it is undefined behavior) or at treating the variable as having several values. See the two links I posted in a comment to the question for entertaining examples. – Pascal Cuoq Feb 05 '13 at 22:43
  • 1
    @HevyLight: This ineed the case. If you know what you're doing this can be usfull behavior. For example OpenSSL uses this as an additional source of entropy. The compiler warnings issued made some Debian maintainer, who didn't know what he was doing, remove the "offending" code together with some other important things, reducing the entropy pool size of Debian's OpenSSL implementation to a mere 16 bit (based on the process ID). This is known as the Debian cryptography disaster. – datenwolf Feb 05 '13 at 22:43
  • 1
    “If you know what you're doing this can be usfull behavior”. No! Please see http://kqueue.org/blog/2012/06/25/more-randomness-or-less/ – Pascal Cuoq Feb 05 '13 at 22:56
  • @PascalCuoq: The way OpenSSL uses uninitialized memory as an source for additional entropy is a lot more robust than what's presented in that blogpost. If it were a simple XOR on a variable it would not really improve the entropy quality. What OpenSSL does is, that it feeds the values of several entropy sources, one after another into a cryptographic hash functions, pushing the resulting bits into a feedback shift register. The nice thing about this is, that the entropy never can shrink if done that way. Even if the compiler optimizes away some parts it will not reduce the already present bits – datenwolf Feb 05 '13 at 23:01
  • @PascalCuoq: Writing this kind of robust code is what I meant with "If you know what you're doing". – datenwolf Feb 05 '13 at 23:03
  • @datenwolf There is no robust way to invoke undefined behavior. When LLVM improves inter-function optimizations, it will realize that OpenSSL is passing a pointer to uninitialized data to a crypto hash function that reads from it, and it will assume that the call is dead code. This is what it already does, it is only not very good at it yet. Programmers will cry wolf and compiler makers will disdainfully explain that the code was wrong to rely on undefined behavior. It has happened before and it will happen again. http://blog.regehr.org/archives/213 http://blog.regehr.org/archives/759 – Pascal Cuoq Feb 05 '13 at 23:18
  • @PascalCuoq: Yes LLVM will assume that particular call dead code. But the entropy pool is filled by further function calls, and those are valid in the compiler's eye. If that one function call seeding the entropy pool with undefined values is optimized out the result would be no different to seeding the entropy pool with a stream of zeroes. The OpenSSL people knew exactly what they were doing and are fully aware of the possibility of such code generation. – datenwolf Feb 05 '13 at 23:53
  • In `entropy0(); entropy1(); entropy2(); entropy3()`, if `entropy2()` systematically invokes undefined behavior, the compiler does not have to generate the call to `entropy3()`. It does not have to generate the call to `entropy1()`. It can do what the fine code generation it wants, because there is undefined behavior. It can even leave the call to `entropy0()` if it wants to, initializing the entropy pool to always the same value, because it is undefined behavior. – Pascal Cuoq Feb 06 '13 at 00:08
  • Why should it not have to generate a call up to `entropy1()`? The entropy1 sequence point clearly comes before the undefined call to entropy2 so any undefined behavior happens only after entropy1, and due to sequence point ordering the compiler *must* generate a well defined call to entropy1. If it doesn't it violates the language specification. C is an imperative language so any inferences on any later writes are not allowed and make no sense. If C was some lazy evaluated functional language, well, then yes, but not in this case. – datenwolf Feb 06 '13 at 00:11
5

This is absolutly not guaranteed to always work. You have to initialize it yourself.

char* someStr = NULL;

or some other value.

Joshua Weinberg
  • 28,598
  • 2
  • 97
  • 90
2

Uninitialized variables are indeterminate. Reading them prior to assigning a value results in undefined behavior.

It is quite easy to check if a pointer is NULL:

if (someStr) {
   // Don't use it (or do for some weird reason)
}

To be on the safe side and make sure the pointer is the value you want it to be, I would assign it a value upon initialization.

char* someStr = NULL;

You could also make the pointer static to avoid the undefined behavior.

static char* someStr;
syb0rg
  • 8,057
  • 9
  • 41
  • 81
  • I liked your answer when it only said “Uninitialized variables are indeterminate. Reading them prior to assigning a value results in undefined behavior”, but now you have added a recommendation that I do not understand. `if (someStr == NULL)` is not different from `if (someStr)`. You shouldn't write either when `someStr` is uninitialized. Why did you add this part? – Pascal Cuoq Feb 05 '13 at 22:40
  • For me it was preference, that's really the only reason why. `if` statements are used to check `boolean` values, which are supposed to be `0` and `1`, and not really meant for `NULL` values. That way you don't know if the pointer is actually `NULL`, or `0`. I can edit it back to the original if you would like. – syb0rg Feb 05 '13 at 22:44
  • I think the stylistic preference* has lower priority than the undefinedness of reading from an uninitialized pointer, and that an answer to the question should concentrate on the latter. * C99 6.8.4.1:2 says “the first substatement is executed if the expression compares unequal to 0”, which allows the condition expression to be an integer, a pointer, or even a floating-point number. – Pascal Cuoq Feb 05 '13 at 22:49
  • I'll revert it to the original, but `NULL` and `0` have different hex representations, which is why I used my specific if statement. I believe that `0` is `0x30` in hex, and `NULL` is `0x01` in hex. – syb0rg Feb 05 '13 at 22:54
  • 1
    6.3.2.3:3 “An integer constant expression with the value 0, […], is called a null pointer constant” (also, the standard allows for several null pointer constants, as implied by the “a” in the quote, but all null pointer constants must compare equal (6.5.9:6)) – Pascal Cuoq Feb 05 '13 at 23:05
  • Hmm, I guess I always assumed that those two values were not equal in C because of their hex representations. Thanks for teaching me something new! – syb0rg Feb 05 '13 at 23:08
1

The value of someStr is not defined. In general it will be set to some old value lying around on the stack. So, it may well be NULL (that is, 0).

Andrew Stein
  • 12,880
  • 5
  • 35
  • 43