3

While C++ is an extraordinary language it's also a language with many pitfalls, especially for inexperienced programmers. I'm talking about things like uninitialized variables of primitive types in a class, e.g.

class Data {
  std::string name;
  unsigned int version;
};

// ...

Data data;
if (data.version) { ... } // use of uninitialized member

I know this example is oversimplified but in practice even experienced developers sometimes forget to initialize their member variables in constructors. While leaving primitives uninitialized by default is probably a relic from C, it provides us the choice between performance (leave some data uninitialized) and correctness (initialize all data).

OK, but what if the logic was inverted? I mean what if all primitives were either initialized with zeros? Or would require explicit initialization whose lack would generate a compile error. Of course for full flexibility one would have a special syntax/type to leave a variable/member uninitialized, e.g.

unsigned int x = std::uninitialized_value;

or

Data::Data() : name(), version(std::uninitialized_value) {}

I understand this could cause problems with existing C++ code which allows uninitialized data but the new code could be wrapped in a special block (extern "C" comes to me mind as an example) to let the compiler know a particular piece of code shall be strictly checked for uninitialized data.

Putting compatibility issues aside, such an approach would result in less bugs in our code, which is what we are all interested in.

  1. Have you ever heard about any proposal like this?
  2. Does such a proposal make sense at all?
  3. Do you see any downsides of this approach?

Note 1: I used the term "strict" as this idea is related to the "strict mode" from JavaScript language which as mentioned on Mozilla Developer Network site

eliminates some JavaScript silent errors by changing them to throw errors

Note 2: Please don't pay attention to the proposed syntax used in the proposal, it's there just to make a point.

Note 3: I'm aware of the fact that tools like cppcheck can easily find uninitialized member variables but my idea is about compile-time support for this kind of checks.

Adam Romanek
  • 1,809
  • 1
  • 19
  • 36
  • 1
    I do like the idea of variables being initialized by default, but allow manual selection of uninitialized. – Neil Kirk Feb 17 '15 at 22:53
  • If we put compatibility issues aside, C++ wouldn't exist. – Alan Stokes Feb 17 '15 at 22:56
  • 1
    Several compilers already report more straightforward cases (which is really helpful). The general case is incomputable. – Alan Stokes Feb 17 '15 at 22:58
  • Sometimes there is no default value.: Eg.: any integer is as good as zero, there is nothing sensitive. –  Feb 17 '15 at 23:01
  • "Has anyone ever considered..." Yes, I have. And I'm sure others have too. But making such a language a reality and then achieving any reasonable degree of adoption would be a significant challenge. – TheUndeadFish Feb 17 '15 at 23:23
  • I have too, but some variables are just better left uninitialized and make no sense to initialize. If you want t create variables and pass by reference to a function to fill them in for example, makes no sense to initialize out side of the function. Which is why I love the warning feature where you are warned if you access a variable which is not initialize. – Chris Feb 18 '15 at 00:13
  • At the same time we would make all data immutable (`const`) by default. The way to do all of this is to create a new language, which _has_ been done... many times. – Lightness Races in Orbit Feb 18 '15 at 00:45
  • I asked a similar question here: http://stackoverflow.com/questions/19235864/why-c-default-initialization-doesnt-zero-initialize-non-class-type-members – swang Feb 18 '15 at 01:41
  • Actually, you don't need to create a new language to do this. A new **compiler** is sufficient. Since the Standard leaves the behavior open, a compiler may define that there are no uninitialized variables simply by initializing them all to 0 (suitably converted). This would be encouraging unportable code, but that is not exactly a commercial disadvantage. – MSalters Feb 18 '15 at 12:30
  • @DieterLücking, I don't quite agree, see what happens when you write `int x = int();` or `std::complex y;`. Here both `x` and `y` variables are initialized using `0` as _neutral element_. And if there's truly no default value one could use `boost::optional` etc. – Adam Romanek Feb 20 '15 at 20:24
  • Variables **are** "initialized by default". In fact, they are *default-initialized* by default. The only possibly surprising part is that default-initialization performs no initialization on scalar types so that they are left in an indeterminate value. And that's precisely what you sometimes want (e.g. when you create a large buffer to receive data, or storage in which you create objects). – Kerrek SB Apr 11 '15 at 11:33
  • @KerrekSB, I fully understand what you say. I just think this makes coding in C++ harder than it should be. My point was to make default-initialization for scalar types do zero-initialization and allow the user to tell the compiler not to perform the zero-initialization when really needed. With such behavior a broad class of bugs related to the use of uninitialized values would be narrowed to minimum. – Adam Romanek Apr 12 '15 at 19:39
  • Hm, I'm not sure how I feel about those bugs. The bugs are essentially due to the programmer not thinking properly about their implementation. If it were just a matter of a programmer assuming that variables are zero by default, then a small amount of education could fix that. My suspicion is that most of the bugs are due to people not thinking properly about all the ins and outs of the problem they're trying to solve. If that's true, then giving some (or any) default value may in fact obscure thinking errors, because the program will run well-definedly but do the wrong thing... – Kerrek SB Apr 12 '15 at 19:50
  • @KerrekSB, I couldn't agree with you more. In fact, as you can see in the description preceding my questions, I also suggested another approach to the problem of default-initialization of primitive types. In this approach the compiler would require them to be explicitly initialized. Leaving them uninitialized would cause a compiler error, unless one would use something like `std::uninitialized_value`. Given that it's been a few weeks since I've asked this question, my opinion is now that this is the preferred option, as it doesn't leave much room for an error. – Adam Romanek Apr 15 '15 at 19:04
  • Honestly, I don't really understand why was my question downvoted... It's just an idea and wanted to know if it's a new one. – Adam Romanek Apr 15 '15 at 19:08

2 Answers2

0

Use -Werror=uninitialized, it does exactly what you want.

You can then “unitialize” a variable with unsigned int x = x;

StenSoft
  • 9,369
  • 25
  • 30
  • Yes, just like using an unintialized variable. It was a response to OP's proposal of `std::unitialized_value`. – StenSoft Feb 18 '15 at 00:47
  • Not really. You can safely leave a variable uninitialised and assign to it at a later date before reading from it. But your version is UB right from the declaration, and I doubt that was the OP's intention with `std::unitialized_value`. – Lightness Races in Orbit Feb 18 '15 at 01:06
  • Self initialization is not UB, see [3.3.2] – StenSoft Feb 18 '15 at 01:28
  • That passage says _"Here the second x is initialized with its own (indeterminate) value."_ and reading an indeterminate value is generally UB. The only guarantee you're being given is that `x` already refers to itself in its own initialiser. – Lightness Races in Orbit Feb 18 '15 at 01:33
  • Reading indeterminate value is not UB, using it can be (computing with it, comparing an uninitialized `bool` etc.). But here it is not used in any way that would trigger UB. – StenSoft Feb 18 '15 at 01:51
  • As @LightnessRacesinOrbit said this is UB, please read [Does initialization entail lvalue-to-rvalue conversion? Is `int x = x;` UB?](http://stackoverflow.com/q/14935722/1708801) and [Has C++ standard changed with respect to the use of indeterminate values and undefined behavior in C++14?](http://stackoverflow.com/q/23415661/1708801). The exception being in the case of narrow characters. – Shafik Yaghmour Feb 18 '15 at 02:31
  • As for the answer itself... Compiler can produce a warning in case a variable is declared (but not initialized) and then used in the same translation unit. Otherwise the compiler can't tell us anything because it can't analyze all the paths (and it only reports about the USE of uninitialized variables, not just having it declared). – Adam Romanek Feb 18 '15 at 21:23
0

Have you ever heard about any proposal like this?
Does such a proposal make sense at all?
Do you see any downsides of this approach?

I haven't heard of any such proposal. To answer the next two, I'll first broadly claim that neither makes sense. To more specifically answer, we need to break the proposal down into two separate proposals:

  1. No uninitialized variables of any kind are allowed.
  2. "Uninitialized" variables are secretly initialized to some well-defined value (e.g. as in Java, which uses false, 0, or null).

Case 1 has a simple counterexample: performance. Examples abound; here's a (simplified) pattern I use in my code:

Foo x;
if (/*something*/) {
    x = Foo(...);
} else {
    x = Foo(...);
}
//[do something with x]

x is "uninitialized" here. The default constructor does get called (if Foo is an object), but this might just do nothing. If it's a big structure, then all I really pay for is the cost of the assignment operator--not the cost of some nontrivial constructor plus the cost of the assignment operator. If it's a typedef for an int or something, I need to load it into cache before xor eax,eaxing or whatever to clear it. If there's more code in the if-blocks, that's potentially a valuable cache miss if the compiler can't elide it.


Case 2 is trickier.

It turns out that modern operating systems actually do change the value of memory when it is allocated to processes (it's a security thing). So, there is a rudimentary form of this happening anyway.

You mention in particular that such an approach would result in [fewer] bugs in our code. I fundamentally disagree. Why would magically initializing things to some well-defined value make our code more robust? In fact, this semi-automatic initialization causes an enormous quantity of errors.

Storytime!

Exactly how this works depends on how the program is compiled (so debug vs. release and compiler). When I was first learning C++, I wrote a software rasterizer and concluded it worked--since it worked perfectly in debug mode. But, the instant I switched to release mode, I got a completely different result. Had the OS not initialized everything to zero so consistently in debug mode, I might have realized this sooner. This is an example of a bug caused by what you suggest.

By some miracle I managed to re-find this depressing question, which demonstrates similar confusions. This is a widespread problem.

Nowadays, some debugging environments put debug values into memory. E.g. MSVC puts 0xDEADBEEF and 0xFEEEFEEE. Lots more here. This, in combination with some OS sorcery, allows them to find use of uninitialized values. Running your code in a VM (e.g. Valgrind) gives you the same effect for free.


The larger point here is that, in my opinion, automatically initializing something to a well-defined value when you forget to initialize it is just as bad (if not worse) than getting some bogus value. The problem is the programmer is expecting something when he has no justification to--not that the value he's expecting is well-defined.

Community
  • 1
  • 1
geometrian
  • 14,775
  • 10
  • 56
  • 132
  • Your counterexample for case 1 is rather weak; it is trivially sidestepped by `Foo x = (condition) ? Foo(a) : Foo(b)`. A tougher case would be `Foo x; for (Foo& f : candidates) if bar(f) { x=f; break; }` – MSalters Feb 18 '15 at 12:34
  • Seems to me you missed my point. As for case 1 - you would still have a way to leave a variable uninitialized: `Foo x = std::uninitialized_value`. This would simply be a no-op, but would tell the compiler that this variable is left uninitialized on purpose (not because you might have forgotten to initialize it). You could still assign it later on, so there would be no performance cost. – Adam Romanek Feb 18 '15 at 21:34
  • As for case 2, i.e. using some well defined values... Please note that this is exactly what happens to all high-level types, e.g. std::vector, std::complex, std::unique_ptr etc. All of them when default-initialized end up in a KNOWN state (value). Moreover std::vector when resized value-initializes its data so they end up in a well-defined state too (they're not left uninitialized). My point was to follow the same path for primitive types. Honestly, why should `Foo* x;` behave differently from `std::unique_ptr x;`? Why can't these two share the same semantics? – Adam Romanek Feb 18 '15 at 21:41
  • @AdamRomanek Case 1: ah, I misinterpreted your intention. I assumed `std::uninitialized_value` would actually have some well-defined value, but be semantically invalid. People do this already (with e.g. `-1`). For case 2, the performance argument of 1 also applies. From the perspective you just mentioned though, I think the main objection would be philosophical. Making a variable is saying `give me memory`. Making an object says `build something for me`. Why _should_ they share the same semantics? – geometrian Feb 19 '15 at 01:40
  • @imallett they should share the same semantics as this would result in consistent behavior. Note that C++ would then be easier to learn and use, event for novice programmers, without sacrificing the flexibility it now offers. I would ask: why should the definition (without initialization) of a variable of a primitive, POD and non-POD type share different semantics? I understand that this idea breaks some fundamental aspects of C++ but all this is just theory, for now... – Adam Romanek Feb 19 '15 at 07:29
  • @AdamRomanek Again, they should have different semantics _because they're different things_. If you want to make them the same thing, you can do what Python does and make _everything_ an object and force _everything_ to be initialized. Python pays for that dearly in performance, but it's a great teaching language. – geometrian Feb 19 '15 at 16:27
  • @imallett, this is exactly why I also proposed `std::uninitialized_value` - it's there to allow you to force a variable to be left uninitialized, giving you the flexibility you have right now. So you don't loose anything, you can only gain something. And honestly, how often do need to leave a variable uninitialized *for performance reasons*? I can agree there are certain cases when this is desired, but these are rather uncommon. – Adam Romanek Feb 20 '15 at 20:34