4

When we initialize an array like this int a[5] = {0}, the compiler makes all 5 elements 0. That is really good, compact-initialization and useful feature.

But I wonder why the compiler doesn't initialize int a[5]={1} similarly? Why does it not make all 5 elements 1? Why the Standard doesn't mandate it? Would it not been an awesome feature? Isn't it missing?

Also, if the number of elements in the initializer is less than the size of the array, then the compile could initialize the remaining elements with the last element in the initializer. Means, int a[5]={1,2,3} is equivalent to int a[5]={1,2,3,3,3}. And similarly, int a[10]={1,2,3,0} is equivalent to int a[10]={1,2,3,0,0,0,0,0,0,0};.

Would it all not be an awesome feature if the Standard mandates it? Or is there any good reasons for this missing feature?


And there is something called designated initializer in C99, which is used like:

Designated initializers can be combined with regular initializers, as in the following example:

int a[10] = {2, 4, [8]=9, 10}

In this example, a[0] is initialized to 2, a1 is initialized to 4, a[2] to a[7] are initialized to 0, and a[9] is initialized to 10.

Quite interesting. But even this feature is not in C++.

finnw
  • 47,861
  • 24
  • 143
  • 221
Nawaz
  • 353,942
  • 115
  • 666
  • 851

4 Answers4

10

Why does it not make all 5 elements 1?

Because you're misunderstanding what {} means. (Actually, in C++ the better way to do this is {} rather than {0}). The syntax {0} does not mean that you want all elements in the aggregate set to zero. Rather, it says that you want an aggregate with the first element zero assigned to the indicated variable (which can be either an array or a class type in C++). Because the aggregate usually has more fields than that one value zero, the remaining elements in the aggregate are default constructed. The default value of a builtin or POD type is to set all of the fields to zero, so you've effectively set the entire aggregate to zero.

As for why specifically, consider the following. According to the current standard, none of the assertions below will fail:

struct abc
{
    char field1;
    int field2;
    char field3;
};

int main()
{
    abc example = {'a', static_cast<int>('b')};
    //All three asserts pass
    assert(example.field1 == 'a');
    assert(example.field2 == static_cast<int>('b'));
    assert(example.field3 == '\0');

    int example2[3] = {static_cast<int>('a'), 42};
    assert(example2[0] == static_cast<int>('a'));
    assert(example2[1] == 42);
    assert(example2[2] == 0);
}

What would you expect the value of field3 to be in your proposed standard change? Even if you define it as the last element in the aggregate initializer as you've shown above, that's going to break compatibility with existing code which assumes the rest of the elements are default constructed.


EDIT: Just realized that your question is asked in terms of arrays - but the answer is the same with either structures or arrays, so it really doesn't matter.

EDIT2: To make this more in keeping with the standard, references to class/structure have been replaced with "aggregate" below, which covers the structures and arrays cases.

Billy ONeal
  • 104,103
  • 58
  • 317
  • 552
  • @Billy: I was specifically talking about arrays, not structs :| – Nawaz Jan 23 '11 at 07:16
  • @Nawaz: Answer doesn't change. Aggregate initializers are valid for both structures and arrays, and they obey the same rules in each case. – Billy ONeal Jan 23 '11 at 07:17
  • 2
    @Billy: they obey same rules now, but they could follow different rules if the standard makes it. – Nawaz Jan 23 '11 at 07:19
  • @Nawaz: Except that would break existing code which relies on the remaining elements in the array being default constructed. And what's worse is that such code would not fail at compile time, only at runtime. – Billy ONeal Jan 23 '11 at 07:21
  • @Billy: if you're talking about existing code written in C, then many C++ features break it already. As for runtime fail, what exactly are you taking about? Could you please elaborate? – Nawaz Jan 23 '11 at 07:25
  • @Nawaz: What I mean is, the standards committee is not going to make a change like this that silently breaks existing code, for a little convenience like this. If you want the elements in your array set, use `std::fill` or `std::fill_n` -- that's why they're there! – Billy ONeal Jan 23 '11 at 07:25
  • @Nawaz: No, I'm not talking about existing code written in C at all. It would break in C++ too. As for compile time, I mean there's no way for the compiler to diagnose that your code will break if you were relying on the zero initialization before. – Billy ONeal Jan 23 '11 at 07:26
  • @Nawaz: would the C++0x standard not break the [some] existing C++03 code? – Nawaz Jan 23 '11 at 07:27
  • @Nawaz: Yes, C++0x has the possibility of breaking C++03 code. But all of the breaks result in your code failing to compile, not silent runtime failures. For example, this program ( http://codepad.org/zz3coJze ) works fine now, but with your proposed change would cause undefined behavior. And there's no way for a compiler to check for it. – Billy ONeal Jan 23 '11 at 07:28
  • @Billy: Please remove the struct explanation, as it doesn't at all address the question what I've asked. At best, it deviates the topic from array-initialization to struct-initialization. The current standard might be treating both in the same way, but I'm specifically talking about arrays, and if that needs different treatment, then discuss that instead. :-) – Nawaz Jan 23 '11 at 07:49
  • @Nawaz: No, I'm not removing anything. It's pertinent to the question you asked because structures and arrays follow the same rules here. You want a one line answer? Fine: Because the current standard doesn't work that way, and changing it would silently break an unknowably large amount of existing code. – Billy ONeal Jan 23 '11 at 07:52
  • @Billy "...default constructed. The default value of a builtin or POD type is to set all of the fields to zero...". Are you 100% sure of this ? I have had dozens of bug because of default constructing not initializing POD fields to zero, that I just can beleive no one has corrected you. ( So I am probably mistaking ). But could you elaborate please ? I am pretty sure default constructing of POD does nothing about initialisation. – Stephane Rolland Jan 23 '11 at 07:56
  • @Billy: *"No, I'm not removing anything. It's pertinent to the question you asked because structures and arrays follow the same rules here."*. Then why do you post to begin with? :P ...I **do know** that what I've asked, is not allowed by the Standard. All you're taking about what is there in the Standard. Whereas I'm talking about why something is not there? Standard could have been different, Standard could have made it so and so. – Nawaz Jan 23 '11 at 07:59
  • @Stephane: Yes, I'm sure. See this example ( http://codepad.org/iGyVQuya ). On my system this writes (bro4@ubuntu:~$ ./a.out Uninitialized value: -1074912856 Default constructed value: 0) (Codepad won't run this because it doesn't like reading the uninitialized value) – Billy ONeal Jan 23 '11 at 08:07
  • @Billy, great example, thanx. Thx for pointing this out to me. – Stephane Rolland Jan 23 '11 at 08:17
4

Yes, they could have done that, but they didn't, and it's far too late now to change such behavior. The decisions behind C and C++ were made with thought given to performance and minimalism at almost every step, so I imagine that, if nothing else, comes into play here as well.

Such a feature just doesn't strike me as all that awesome. It's a very simple piece of syntactic sugar, and I rarely find the need to initialize an array like that to anything other than 0 anyway.

MikeP
  • 7,829
  • 33
  • 34
  • that is not convincing. Such initialization doesn't hinder performance. – Nawaz Jan 23 '11 at 07:08
  • @Nawaz: the key here is "they could have done that, but they didn't". I doubt that DMR, Kernighan or someone similarly involved in C's early design decisions have given a rationale for this particular bit of C's behavior. I'd be surprised if they gave this issue much thought other than that it's they way they decided it should work. – Michael Burr Jan 23 '11 at 07:41
2

Typical runtime libraries provide a feature that makes it easy to initialise data to 0. In general terms, this is stored in a specific section in the executable, organised by the compiler and linker. At program startup, the runtime startup code uses something like memset() to clear out all the initialised data to 0. This means that the zero bytes don't have to be stored inside the executable itself.

The converse is that if you initialise data to something other than zero, then the bytes for that data must be stored in the executable itself, since the automatic initialiser only initialises to zero.

Therefore, if you were to declare a big array of char (say a megabyte?) and initialise it with, say, {0}, then there would not be bytes stored in the executable for that array. On the other hand, if you were to initialise it with {1} under your scheme, a megabyte of 1 bytes would have to be stored in the executable itself. By changing one character in the initialiser list, the size of the executable increases by a megabyte.

I believe such a scheme would violate the principle of least surprise.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • 1
    True in the case of globals, but I don't believe it's true in the case of stack allocated variables. (I also don't think anything like this happens on Windows but I could be wrong) – Billy ONeal Jan 23 '11 at 07:35
  • Zero-initialization is *required* by the language, for applicable variables. But it doesn't appear to apply here, anyway, and it doesn't seem hard to avoid a megabyte of ones by the compiler generating a for loop to initialize, if that was desired. – Fred Nurk Jan 23 '11 at 08:10
0

I personally find more "logical" (i.e. simple) having a fixed default initializer instead of another rule of repeating last one just for arrays. That may appear "practical" (i.e. useful) but it's IMO more logically complex.

That said however I think you're making a big mistake in trying to apply logic to a language like C++.

C++ is a complex language whose rules are the result of a long evolution history, and its current form is the result of the work of many people and even of formal committees (the last part alone could explain anything).

A language like C++ cannot be inferred by logic, it must be studied like history. Unless you're Hari Seldon there's really no way you can infer history using logical reasoning.

There are places of C++ where you're going to suffer a lot if you try to use logic instead of studying. Just to name a few...

  • Why is default dispatch static (i.e. wrong)?
  • Why there's no keyword for the null pointer?
  • Why the difference of two unsigned is unsigned?
  • Why a sum between a signed and an unsigned is unsigned?
  • If unsigned means "element of Z_{2^n}" then why sizes are unsigned?
  • Why std::string s; s=3.141592654; is perfectly valid C++?
  • Why in C++0X i = i++ + 1; is undefined behavior and i = ++i + 1; is valid?
  • Why double x=3.14; int y(int(x)); doesn't mean y will be 3?
6502
  • 112,025
  • 15
  • 165
  • 265
  • I agree with the overall assertion that "why is C++ this way" questions must be tackled with historical rather than logical techniques, but your long list of debatable design decisions in C++ seems like unnecessary flamebait to me. – zwol Feb 04 '11 at 19:31
  • Considering many of the things in the list make perfect sense, and some of the things in the list are completely wrong, I have to agree with Zack. -1 – Billy ONeal May 13 '12 at 04:56
  • @BillyONeal: Which of the elements of the list do you think are wrong? I will be happy to correct them... – 6502 May 13 '12 at 06:40
  • @Zack: I am sorry if you think that listing illogical design decisions about C++ is flamebait. Flames about these points only happens when someone that recently studied C++ is still in the infatuation period where everything in C++ is just perfect. Getting past that stage is IMO automatic once someone writes enough real production code in C++. – 6502 May 13 '12 at 06:52
  • @6502: For instance, your last example (int y(int(x))) is vexing. GCC (for some reason) assigns y to be one, but a conformant toolchain will fail to link such code unless a function called "y" exists somewhere in the sources. There is a keyword for the null pointer. All of the things with signed/unsigned conversions are no different than any other programming language (e.g. if the rules were any other way people would still complain), and compilers that aren't bad will already warn about such things in the right places, assuming warnings are turned on. Etc. – Billy ONeal May 15 '12 at 02:05
  • @6502: C++ is not a perfect language, and there are plenty of valid things to rip on (e.g. why on earth would exception specifications work the way they do in a sane world, [null terminated strings](http://stackoverflow.com/questions/4418708/whats-the-rationale-for-null-terminated-strings), etc.), but what you listed here are piddily nit-picky things you're going to find loads of in every programming language. (At least, every language I've ever seen) – Billy ONeal May 15 '12 at 02:06
  • @BillyONeal: Ok, so apparently you found nothing "completely wrong" (the tag for the question is C++, not C++11, and the null pointer "keyword" handling in C++11 seems incredible but managed to make things even more complex, special cased and illogical). The example you named vexing is indeed "the most vexing C++ parsing rule" and it's truly horrendous from a technical point of view (there's no way to express that in say BNF and goes "if something can be a declaration then it's a declaration"), giving C++ an ambiguous grammar for no good reasons. Sure no language is perfect, so? – 6502 May 16 '12 at 06:43
  • @6502: C++ \*is\* C++11. And I agree that the vexing parse is annoying. What's wrong with your example is that the example is not valid C++, as it will not compile on a standards compliant compiler. C++ does not have an ambiguous grammar. It is ambiguous to yacc/bison because these tools use LALR(1) parsing, which are relatively restrictive parsing models. (In fact, some of the more annoying bits in C++, such as when and when not to use `typename` in templates, are there specifically to remove ambiguities in the grammar). – Billy ONeal May 16 '12 at 17:28
  • @BillyONeal: That code *IS* valid C++... simply the meaning is not what a programmer would expect. And the reasons are quite illogical (i.e. pointless support for local external function declarations, pointless optional and ignored parenthesis around parameter names, pointless double syntax available for initialization). The C++ grammar is so bad that it took YEARS just to have compilers to agree on what is C++ and what is not. In C++ you may have to parse and do semantic analysis of an arbitrary amount of tokens to decide what is the meaning of the very first of them. – 6502 May 16 '12 at 19:25