-2

I wonder is it true that just passing an uninitialized variable to a function results in an undefined behavior?

It seems really weird for me.

Suppose that we have the following code:

void open_db(db* conn)
{
  // Open database connection and store it in the conn
}

int main()
{
  db* conn;
  open_db(conn);
}

It seems perfectly legal to me. It doesn't dereference an uninitialized variable nor it doesn't relay on its state. It just passes an uninitialized pointer to another function that stores some data in it via operator new or something like this.

If it's UB, could you quote the exact place where the Standard says so?

And is it also true for other types like int?

void foo(int bar)
{
  // ...
}

int main()
{
  int bar;
  foo(bar); // UB?
}
FrozenHeart
  • 19,844
  • 33
  • 126
  • 242
  • It's not UB per se. It starts to be if you're going to use it. – πάντα ῥεῖ Sep 24 '16 at 22:02
  • Did someone tell you it was UB? – NathanOliver Sep 24 '16 at 22:02
  • @πάντα ῥεῖ Are you sure? http://stackoverflow.com/a/39681217/1608835 – FrozenHeart Sep 24 '16 at 22:02
  • @NathanOliver Yep, see http://stackoverflow.com/questions/39681190/code-crashes-because-of-an-uninitialized-variable-even-if-i-dont-actually-use-i – FrozenHeart Sep 24 '16 at 22:03
  • @FrozenHeart Yes. – πάντα ῥεῖ Sep 24 '16 at 22:03
  • @πάντα ῥεῖ Did you see the question at the specified link then? – FrozenHeart Sep 24 '16 at 22:03
  • ***Using*** an uninitialized non-static local variable (no matter where it was defined) leads to UB. Just passing it to a function which does not use the contents or value of that variable is in itself not UB. – Some programmer dude Sep 24 '16 at 22:03
  • @Joachim Pileborg Yeah, that's what I thought too. See http://stackoverflow.com/questions/39681190/code-crashes-because-of-an-uninitialized-variable-even-if-i-dont-actually-use-i however – FrozenHeart Sep 24 '16 at 22:04
  • @FrozenHeart I've seen it as soon you posted it, and even added the [tag:language-lawyer] tag a few minutes later on. – πάντα ῥεῖ Sep 24 '16 at 22:05
  • @FrozenHeart Maybe you should add "passing *by value*"? The copy involves a read from an uninitialized variable. – juanchopanza Sep 24 '16 at 22:07
  • 2
    I fail to see the connection to "language-lawyer", really. It's folklore knowledge that reading from uninitialized variables is UB. And the code reads from uninitialized variables "conn" and "bar". There's no lawyering needed, not really. – Johannes Schaub - litb Sep 24 '16 at 22:07
  • Well that question have to do with pointers, and pointers are always special. Can't quote a section from the specification though, but I still think that it's the *using* part that is the key. Though some may argue that passing a variable as argument to a function *is* using it, and in a way I can agree because the (indeterminate) value is copied and therefore "used". But it's not really used in the way that most people would define the word "used", if the code itself doesn't use it (besides the actual argument). – Some programmer dude Sep 24 '16 at 22:08
  • @Joachim If it's not _used_ the code is probably useless anyway as _@Hans Passant_ was already pointing out at their 1st comment at the other question. – πάντα ῥεῖ Sep 24 '16 at 22:14
  • There's no practical use for this, *"Open database connection and store it in the conn"* isn't possible anyway. – alain Sep 24 '16 at 22:14
  • @alain Why do you think so? – FrozenHeart Sep 24 '16 at 22:15
  • @πάντα ῥεῖ It is allocated inside the `open_db` function and used later – FrozenHeart Sep 24 '16 at 22:16
  • Because you pass the pointer by value, thus `main` will never see any change. – alain Sep 24 '16 at 22:16
  • @FrozenHeart You can't do that without passing a reference. You probably have some basic misconceptions. – πάντα ῥεῖ Sep 24 '16 at 22:17
  • @πάντα ῥεῖ Ok. But anyway, is it UB or not? – FrozenHeart Sep 24 '16 at 22:20
  • @FrozenHeart As mentioned, its actually not, just useless. – πάντα ῥεῖ Sep 24 '16 at 22:20
  • 4
    It is also different between C and C++, from what I know. In C++ it is always UB, and in C it is UB only if the type has trap representations. If it hasn't then the value is merely unspecified. So my understanding of C is that you are allowed to read from uninitialized ints, pointers etc as long as they cannot have trap representations. In C++, you are not allowed and the compiler may go wild on you. However, I'm not familiar enough with C to be certain on this. – Johannes Schaub - litb Sep 24 '16 at 22:20
  • @Johannes Nice to see you're actively around. :) – πάντα ῥεῖ Sep 24 '16 at 22:22
  • C *means* to make this unconditionally UB (C99 J.2: "The behavior is undefined in the following circumstances: ... The value of an object with automatic storage duration is used while it is indeterminate".) However, Annex J is not normative and I can't find any normative language that actually backs that up in the case where the type has no trap representations. On the gripping hand, C compilers absolutely do treat reading any uninitialized variable as UB regardless of type and regardless of whether trap reps exist in the hardware. – zwol Sep 24 '16 at 22:24
  • @zwol perhaps it's the unspecified nature of trap representations that formally gives it UB? In C++ at least, if UB happens only for some choices of unspecified behavior and not for others at runtime, the behavior is undefined nontheless regardless of what choice the implementation actually takes. – Johannes Schaub - litb Sep 24 '16 at 22:29
  • It's UB whether you use it or not because the lvalue-to-rvalue conversion of an uninitialized instance of something other than a C++ class is explicitly UB. – David Schwartz Sep 24 '16 at 22:33
  • @FrozenHeart Looks like you have to decide for a specific language tag, to get a concise answer. As stated so often c isn't c++ isn't c. – πάντα ῥεῖ Sep 24 '16 at 22:39
  • 1
    Please choose a single language tag, the answer is different for C than C++ and quite complicated to write up – M.M Sep 24 '16 at 22:44
  • @M.M Are you sure that it's different? – FrozenHeart Sep 24 '16 at 22:44
  • 3
    **−1** Asking for standards quote for 2 or more different languages. – Cheers and hth. - Alf Sep 24 '16 at 22:46
  • @Cheersandhth.-Alf IMNSHO it is perfectly fine to do that when the languages are C and C++. Questions about the ways in which C and C++ are / are not different for apparently-common constructs often need attention from *both* languages' lawyers to resolve. – zwol Sep 24 '16 at 23:31
  • 3
    @zwol: I do not see a language comparison question. On the contrary I see the OP asking for what "the" Standard says. To my eyes it's clear that the OP is treating the issue as language-independent, and/or thinks the C and C++ rules must be the same, given by a common standard. – Cheers and hth. - Alf Sep 24 '16 at 23:38
  • @zwol: Annex J just summarises rules stated in other chapters, possibly not as a single contraint, but the result of applying multiple rules (A->B->C...). The issues listed are nevertheless mandatory. And the two languages are too different to have such issues answered in a single answer. – too honest for this site Sep 25 '16 at 01:01

4 Answers4

4

It is UB, and the type of the argument does not matter. The relevant bits of C99 are: when you declare a variable with "automatic storage duration" but don't initialize it, its value is indeterminate (6.2.4p5, 6.7.8p10); any use of an indeterminate value provokes undefined behavior (J.2 refers to 6.2.4, 6.7.8, and 6.8)1.

And even if it wasn't UB (for instance if conn had been initialized)., this code would not have the effect you seem to expect it to have. As written, open_db cannot modify the variable conn in its caller.

A slight variation on your code is valid whether or not conn is initialized, and does do what you expect it to do, though:

void open_db(db **conn)
{
  *conn = internal_open_db();
}

int main()
{
  db *conn;
  open_db(&conn);
}

The address-of operator, unary &, is one of the very few things in the language that does not provoke undefined behavior when applied to an uninitialized variable, because it does not read the value of the variable. It only determines the memory location of the variable. That is a determinate value, that can safely be passed to open_db (but note that its type signature has changed: it is now receiving a pointer to a pointer to a db. And open_db can now use the pointer-dereference operator, unary *, to write a result into the variable.

In C++ only, this very common pattern receives a bit of syntactic sugar:

void open_db(db *&conn)
{
  conn = internal_open_db();
}

int main()
{
  db *conn;
  open_db(conn);
}

Changing the second star to an ampersand makes the conn argument to open_db now a "reference" to a pointer. It's still a pointer to a pointer "under the hood", but the compiler fills in the & and * operators for you as necessary.


1 For my fellow language lawyers: Annex J is non-normative, and I can't find any normative statement backing up its assertion that using an indeterminate value is always UB. (It might help if I could find a definition of what it means to "use a value" in the first place. I believe the intent was anything that triggers 6.3.2.1p2 "lvalue conversion", but I don't think that's ever actually stated.)

The definition of an "indeterminate value" is "an unspecified value or a trap representation"; using an unspecified value does not provoke UB. Using a trap representation does provoke UB, but not all types have trap reps. C11, but not C99, has a sentence in 6.3.2.1p2 that states quite baldly "if [the code reads a value from] an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized, the behavior is undefined" -- but note that it doesn't use the term-of-art "indeterminate value" here, and it restricts the rule to variables whose address is not taken.

However, C compilers absolutely do treat reading any uninitialized variable as UB regardless of whether its type has trap reps or whether its address has been taken, and J.2 certainly reflects the intent of the committee, as do a number of examples in clause 7 where the word "indeterminate" appears solely to point out that reading some variable is UB.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • The way I read the C spec, it is only undefined behavior if the type can have trap representations. Because an indeterminate value is: "unspecified value or a trap representation". However, whether or not a type has traps is unspecified, so an implementation doesn't have to document it. But if it does (x86 ABI), I don't think it's OK for the compiler to go crazy on you if you read an indeterminate value. – Johannes Schaub - litb Sep 24 '16 at 22:25
  • "any use of an indeterminate value provokes undefined behavior" -- and why do you think that passing such variable means that we actually **use** it? – FrozenHeart Sep 24 '16 at 22:25
  • @FrozenHeart Because the standard says that evaluating an expression whose value is indeterminate is UB. You don't have to do anything special with the indeterminate value beyond evaluating something that produces one. (Here, see the section on the required lvalue-to-rvalue conversion, which is UB if the value is indeterminate.) – David Schwartz Sep 24 '16 at 22:27
  • @DavidSchwartz Oh, this is interesting. N1570 6.3.2.1p2: "...If the lvalue [that's about to be converted to what we no longer call an 'rvalue'] designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined." That language was added in C11 -- no such sentence appears in C99. – zwol Sep 24 '16 at 22:35
  • @zwol they realized not even "char" is safe of exceptions on a read from the register for uninitialized variables on the intel itanium processor. If the address is taken, the variable is forced to the memory and read from there. C++ did follow the C committee here with http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#240 – Johannes Schaub - litb Sep 24 '16 at 22:54
2

This generates warning:

int a;
int b = a; //warning, a is uninitialized, but is USED to initialize b, UB

Since using variable with Uspecified Value is UB, passing uninitialized variable to function by value should result the same, since it involves copying - yet no checking is done by compiler so no warning is generated.

If this isn't some amazing exception, this is definitely Undefined Behaviour.

xinaiz
  • 7,744
  • 6
  • 34
  • 78
1

In C: This thread covers all the issues related to reading uninitialized variables, it's somewhat complicated.

Passing a variable by value to a function requires reading it, obviously.

For C++14 see this answer.

Both of your code samples are undefined behaviour in both languages.

Community
  • 1
  • 1
M.M
  • 138,810
  • 21
  • 208
  • 365
0

In the example you give

void open_db(db* conn)
{
  // Open database connection and store it in the conn
}

int main()
{
  db* conn;
  open_db(conn);
}

the variable, conn, in main is an uninitialized pointer.

You then pass a copy of that to open_db. You are not passing the address of the pointer, you are passing the uninitialized value as an address-of.

This requires a read of an uninitialized address in order to populate the copy used in db_conn.

The compiler is free to recognize this and either perform the read with potential undefined behavior consequences (it is feasible that the program might crash performing such a read) or the compiler might elide the copy and just let db_conns conn parameter be differently undefined.

Based on other comments I've read, I believe you're trying to be clever and say "Aha! But I always initialize conn inside db_conn and never read it without initializing it".

Ok... That's ... perverse.

void db_conn(db* conn)
{
    db* new_conn = db_connection_helper();
    if (!new_conn) {
        log_error("Couldn't open database");
        return;
    }
    log_success("Opened database");
    conn = new_conn;
    configure_db_connection(conn);  // first read: guaranteed initialized
    setup_stored_procedures(conn);
}

In this function, conn was passed by value, so conn is a copy of whatever argument was passed to us. Any assignments made to it in the body of db_conn are invisible to the caller.

Indeed, the optimizer is quite likely to treat this code

    conn = new_conn;
    configure_db_connection(conn);  // first read: guaranteed initialized
    setup_stored_procedures(conn);

as

    configure_db_connection(new_conn);  // first read: guaranteed initialized
    setup_stored_procedures(new_conn);

We can easily see this in the assembly

typedef struct db_t {} db;

extern db* db_conn_helper();
extern void db_configure(db*);

void db_conn1(db* conn)
{
  db* new_conn = db_conn_helper();
  if (!new_conn)
    return;
  conn = new_conn;
  db_configure(conn);
}

void db_conn2(db* conn)
{
  db* new_conn = db_conn_helper();
  if (!new_conn)
    return;
  db_configure(new_conn);
}

produces

db_conn1(db_t*):
        subq    $8, %rsp
        call    db_conn_helper()
        testq   %rax, %rax
        je      .L1
        movq    %rax, %rdi
        addq    $8, %rsp
        jmp     db_configure(db_t*)
.L1:
        addq    $8, %rsp
        ret
db_conn2(db_t*):
        subq    $8, %rsp
        call    db_conn_helper()
        testq   %rax, %rax
        je      .L5
        movq    %rax, %rdi
        addq    $8, %rsp
        jmp     db_configure(db_t*)
.L5:
        addq    $8, %rsp
        ret

So this means that if your code tries to use db in main, you're still seeing undefined behavior:

int main() {
    db* conn;  // uninitialized
    db_conn(conn);  // passes uninitialized value
    // our 'conn' is still uninitialized
    query(conn, "SELECT \"undefined behavior\" FROM DUAL");  // UB
}

Perhaps you either mean't

conn = db_conn();  // initializes conn

or

db_conn(&conn);  // only undefined if db_conn tries to use *conn

this requires db_conn to take db**.

db* conn => uninitialized
db** &conn => initialized pointer to an uninitialized db* pointer
kfsone
  • 23,617
  • 2
  • 42
  • 74