18

When I initialize the array below all the output looks ok except for values[3]. For some reason values[3] initialized as values[0]+values[5] is outputting a very large number. My guess is that I am trying to assign values[0]+values[5] before they are properly stored in memory but if someone could explain that would be great.

int main (void)
{

    int values[10] = { 
        [0]=197,[2]=-100,[5]=350,
        [3]=values[0] + values[5],
        [9]= values[5]/10
    };

    int index;

    for (index=0; index<10; index++)
        printf("values[%i] = %i\n", index, values[index]);


    return 0;
}

The output is as follows:

values[0] = 197
values[1] = 0
values[2] = -100
values[3] = -1217411959
values[4] = 0
values[5] = 350
values[6] = 0
values[7] = 0
values[8] = 0
values[9] = 35
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
Luke Murray
  • 861
  • 8
  • 10
  • 11
    W-what is this crazy array intialization syntax ? How on earth is this legal and compiling ? – tux3 Mar 02 '15 at 15:39
  • 8
    @tux3: https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Designated-Inits.html – Mat Mar 02 '15 at 15:40
  • 1
    It is gcc compiler specific see: http://stackoverflow.com/questions/201101/how-to-initialize-an-array-in-c the second awnser. – hetepeperfan Mar 02 '15 at 15:41
  • 7
    @tux3, It's the designated initialization syntax introduced in C99. – ach Mar 02 '15 at 15:41
  • Ah, I should have known. Thanks @all. – tux3 Mar 02 '15 at 15:42
  • 2
    Are you sure the values [0] and [5] already have their values when it is coming to [3]? I guess it is not guaranteed. – Eugene Sh. Mar 02 '15 at 15:45
  • 2
    @Mat: Being "crazy initialization synatx" and "in C99" are not mutually exclusive conditions. If something is in the standard, but is really stupid, or just less clear than an alternative, why use it? – jamesqf Mar 02 '15 at 19:15
  • I do not. The answers were very thorough and well thought out. Thank you for asking! – Luke Murray Mar 19 '15 at 02:04

5 Answers5

16

It looks like you are subject to unspecified behavior here, since the order of evaluation of the initialization list expressions is unspecified, from the draft C99 standard section 6.7.8:

The order in which any side effects occur among the initialization list expressions is unspecified.133)

and note 133 says:

In particular, the evaluation order need not be the same as the order of subobject initialization.

As far as I can tell, the normative text that backs up note 133 would be from section 6.5:

Except as specified later [...] the order of evaluation of subexpressions and the order in which side effects take place are both unspecified.

and we can see that an intializer is a full-expression from 6.8 (emphasis mine):

A full expression is an expression that is not part of another expression or of a declarator. Each of the following is a full expression: an initializer; [...]

After looking back at one of my old C++ answers that covered sequence points within an initializer and which places the full-expression in a different place then I originally concluded, I realized the grammar in 6.7.8 contained initializer twice:

initializer:
    assignment-expression
    { initializer-list }
    { initializer-list , }
initializer-list:
    designationopt initializer
    initializer-list , designationopt initializer

I originally did not notice this and thought the statement on full-expressions applied to the top element in the above grammar.

I now believe like C++ the full-expression applies to each initializer within the initializer-list which make my previous analysis incorrect.

Defect report 439 confirmed my suspicion that this was indeed the case, it contains the following example:

#include <stdio.h>

#define ONE_INIT      '0' + i++ % 3
#define INITIALIZERS      [2] = ONE_INIT, [1] = ONE_INIT, [0] = ONE_INIT

int main()
{
    int i = 0;
    char x[4] = { INITIALIZERS }; // case 1
    puts(x);
    puts((char [4]){ INITIALIZERS }); // case 2
    puts((char [4]){ INITIALIZERS } + i % 2); // case 3
}

and it says:

In every use of the INITIALIZERS macro, the variable i is incremented three times. In cases 1 and 2, there is no undefined behavior, because the increments are in expressions that are indeterminately sequenced with respect to one another, not unsequenced.

so each intializer within INITIALIZERS is a full-expression.

Since this defect report is against C11 it is worth noting that C11 is more verbose then C99 in the normative text on this issue and it says:

The evaluations of the initialization list expressions are indeterminately sequenced with respect to one another and thus the order in which any side effects occur is unspecified.152)

There is undefined behavior in the case where the following expressions are evaluated before the respective elements in values are assigned to:

 values[0] + values[5]

or:

 values[5]/10

This is undefined behavior since using an indeterminate value invokes undefined behavior.

In this specific case the simplest work-around would be to perform the calculations by hand:

int values[10] = { 
    [0]=197,[2]=-100,[5]=350,
    [3]= 197 + 350,
    [9]= 350/10
};

There are other alternatives such as doing the assignments to element 3 and 9 after the initialization.

Community
  • 1
  • 1
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • Even if the order is left-to right, the expression is equivalent to: `int values[10] = { 197, 0, -100, values[0] + values[5], 0, 350, 0, 0, 0, values[5]/10 };` , while here the dependent values are evaluated before the dependencies. – Eugene Sh. Mar 02 '15 at 16:00
  • @EugeneSh. well `6.7.8` [says](http://stackoverflow.com/a/25386873/1708801) the following `a designation causes the following initializer to begin initialization of the subobject described by the designator` – Shafik Yaghmour Mar 02 '15 at 16:04
  • 2
    @KubaOber It is up to the programmer to choose their ways. You can play it hardcore and use some "edgy" features, and than risk your sanity. But you can do the same things very well without even getting close there. – Eugene Sh. Mar 02 '15 at 16:06
  • @ShafikYaghmour Oh, so it's even better explanation. So no unspecified behaviour, everything is as expected. – Eugene Sh. Mar 02 '15 at 16:07
  • 2
    @EugeneSh. I firmly believe in languages that act as you mean. C/C++ mostly don't. They quite blatantly take what you'd mean and turn it into something else. When you write out a data dependency, you mean it, and a compiler has all the information needed to order the initializations such that the data dependency is satisfied, or to issue an error if the dependencies are circular or otherwise unsatisfiable. Instead, the language designers don't even bother to require a diagnostic for such code. Sadly, that's the philosophy of C/C++. There's no runtime performance reason for this - none at all. – Kuba hasn't forgotten Monica Mar 02 '15 at 16:10
  • @KubaOber, you cannot reasonably expect that compiler solves equations for you, can you? – ach Mar 02 '15 at 16:18
  • @KubaOber In the case of unspecified behavior, the rationale is often performance and/or being impartial. The standard committee didn't want to restrict compilers to evaluate some things in a particular order, as lot of the optimization tricks in compilers depended on that, at some point. The core problem is that C is an ISO standard and therefore isn't allowed to favour one manufacturer instead of another. They cannot favour the sane manufacturer: the stupid manufacturer must be allowed to compete at equal terms (one's complement computers, anyone?). – Lundin Mar 02 '15 at 16:18
  • 3
    @AndreyChernyakhovskiy The compilers have been solving equations for decades now. Just look at modern register allocators, or look at how some aspects of functional languages are implemented. A modern compiler does some pretty heavy math, whether numerical or boolean, when compiling most boring code. – Kuba hasn't forgotten Monica Mar 02 '15 at 16:21
  • @KubaOber, that's true, but these special cases pose a quite narrow range of equations. In an initializer, one can write *anything*. – ach Mar 02 '15 at 16:31
  • Just for query, Is this valid `int a[] = {2, 5, [0] = 3, 6}`? – haccks Mar 02 '15 at 17:44
  • @Lundin added normative text – Shafik Yaghmour Mar 02 '15 at 21:17
  • 1
    @AndreyChernyakhovskiy I don't know why you even brought up solving any equations. Evaluating data dependencies doesn't imply solving arbitrarily complex equations. A data dependency is when a particular expression depends on the value of other expression(s). The complexity of the expression is otherwise irrelevant. Compilers *already* do data dependency analysis to constrain instruction motion, for example. – Kuba hasn't forgotten Monica Mar 02 '15 at 21:23
  • @KubaOber I don't think it is possible in all cases for the compiler to properly discover all data dependencies. This is just a best-efforts approach. It would be insane to define correctness based on such a unreliable mechanism. Consider `int a[3] = { [0] = x, [1] = f([0]), [2] = y }` where `f` is an arbitrary function, possibly a method of a polimorphic object (so we cannot analyze the code of `f`), which takes a reference to `[0]` - the zeroth cell of `a`, then takes it address, adds two, and dereferences it, therefore reads `[2]`. The compiler has no possibilty to detect that. – ciamej Mar 02 '15 at 23:36
  • @ciamej method of a polymorphic object in C? The compiler knows the size of `values`, so it could detect that, but even putting that aside, that's no excuse to overlook a direct references to `values` in the initializers for elements of `values`, which could and should get a diagnostic. – David Conrad Mar 02 '15 at 23:46
  • @DavidConrad oh, my bad, allow then, `f` to be a function pointer whose value cannot be determined at that point, you get the same result. – ciamej Mar 03 '15 at 00:03
  • @DavidConrad I agree that a warning would be welcome here, however KubaOber asked for the language to be designed in such a way that all dependencies are discovered and taken into account when evaluating expressions - this is clearly impossible. – ciamej Mar 03 '15 at 00:06
  • @ciamej The compiler can certainly detect that if `f` references what is effectively a compile-time constant. If the compiler can't do that, then other rules would apply. A very sane rule would be that if the compiler can't figure it out, you have to annotate a data dependency (or lack thereof) manually. That's why you have `restrict` in C99, although it's but a small step in the right direction. A sensible annotation would mean "assume the worst": the compiler must forgo certain optimizations, but you are explicitly allowing it to do so. So no hidden perf. penalties. – Kuba hasn't forgotten Monica Mar 03 '15 at 16:59
  • @ciamej Basically, the compiler can detect all dependencies where discovering them makes sense, and those where it doesn't make sense are left to you to assert whether data dependencies exist or not. – Kuba hasn't forgotten Monica Mar 03 '15 at 16:59
7

This has nothing to do with designated initializers as such. It is the same bug as you'd get when attempting something like this:

int array[10] = {5, array[0]};

The order in which initialization list expressions are executed is simply unspecified behavior. Meaning it is compiler-specific, undocumented and should never be relied upon:

C11 6.7.9/23

The evaluations of the initialization list expressions are indeterminately sequenced with respect to one another and thus the order in which any side effects occur is unspecified.

Since you are using array items to initialize other array members, it means that you must change your code to run-time assignment instead of initialization.

  int values[10];

  values[2] = -100;
  values[5] = 350;
  values[3] = values[0] + values[5];
  ...

As a side-effect, your program will now also be far more readable.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • Moreover, the cleverer compilers are still free to do constant and data propagation and effectively produce a pre-initialized array for you. – Kuba hasn't forgotten Monica Mar 02 '15 at 16:02
  • @KubaOber Indeed. Such an optimization is extremely likely. – Lundin Mar 02 '15 at 16:03
  • Another possible workaround would be `enum { V0 = 107, V5 = 350 }; int values[10] = { [0] = V0, [2] = -100, [5] = V5, [3] = V0 + V5, [9] = V5/10 };` – M.M Mar 03 '15 at 01:14
  • If there is an indeterminately-sequenced read and write for the same location, then it is undefined behaviour, so I think the original code is actually UB. (Unspecified but non-undefined behaviour would occur when the initializers were function calls, for example) – M.M Mar 03 '15 at 01:17
  • @MattMcNabb *indeterminately sequenced* is fine, unsequenced is not. – Shafik Yaghmour Mar 03 '15 at 03:34
  • @ShafikYaghmour thanks, didn't realize those were different things – M.M Mar 03 '15 at 04:03
  • @MattMcNabb It isn't the same location though, it is different items of the array. Code like `[0] = array[0],` is probably undefined behavior though. – Lundin Mar 03 '15 at 07:41
5

This is the first time that I have seen something initialized that way, but I figured that the behavior you are seeing had to do with accessing a piece of the array that has not yet been initialized. So I built it using GCC 4.6.3 on a 32-bit Ubuntu 12.04 system. In my environment, I got different results than you.

gcc file.c -o file

./file
values[0] = 197
values[1] = 0
values[2] = -100
values[3] = 197
values[4] = 0
values[5] = 350
values[6] = 0
values[7] = 0
values[8] = 0
values[9] = 35


objdump -d file > file.asm

cat file.asm     (relevant portion posted below)

080483e4 <main>:
 80483e4:   55                      push   %ebp
 80483e5:   89 e5                   mov    %esp,%ebp
 80483e7:   57                      push   %edi
 80483e8:   53                      push   %ebx
 80483e9:   83 e4 f0                and    $0xfffffff0,%esp
 80483ec:   83 ec 40                sub    $0x40,%esp
 80483ef:   8d 5c 24 14             lea    0x14(%esp),%ebx
 80483f3:   b8 00 00 00 00          mov    $0x0,%eax
 80483f8:   ba 0a 00 00 00          mov    $0xa,%edx
 80483fd:   89 df                   mov    %ebx,%edi
 80483ff:   89 d1                   mov    %edx,%ecx
 8048401:   f3 ab                   rep stos %eax,%es:(%edi)   <=====
 8048403:   c7 44 24 14 c5 00 00    movl   $0xc5,0x14(%esp)
 804840a:   00 
 804840b:   c7 44 24 1c 9c ff ff    movl   $0xffffff9c,0x1c(%esp)
 8048412:   ff
 8048413:   8b 54 24 14             mov    0x14(%esp),%edx
 8048417:   8b 44 24 28             mov    0x28(%esp),%eax
 804841b:   01 d0                   add    %edx,%eax
 804841d:   89 44 24 20             mov    %eax,0x20(%esp)
 8048421:   c7 44 24 28 5e 01 00    movl   $0x15e,0x28(%esp)
 8048428:   00 
 8048429:   8b 4c 24 28             mov    0x28(%esp),%ecx
 804842d:   ba 67 66 66 66          mov    $0x66666667,%edx
 8048432:   89 c8                   mov    %ecx,%eax
 8048434:   f7 ea                   imul   %edx
 8048436:   c1 fa 02                sar    $0x2,%edx
 8048439:   89 c8                   mov    %ecx,%eax
 804843b:   c1 f8 1f                sar    $0x1f,%eax

I've identified a key line in the above output that I think marks the difference between what yours generated and what mine generated (marked with <======). Before specific array elements are initialized with the values you specified, mine is zeroing the contents of the array. The specific initialization of array elements occurs after this.

Given the above behavior, I do not think that it is unreasonable to hypothesize that yours did not zero the array contents prior to initializing specific elements of the array. As to why the difference in behavior? I can only speculate; but my first guess is that we are using two different compiler versions.

Hope this helps.

Sparky
  • 13,505
  • 4
  • 26
  • 27
  • 2
    It is unspecified behavior so the results may turn out different. Nothing mysterious here. Simply don't write code that relies on it. – Lundin Mar 02 '15 at 16:20
4
int values[10] = { 
    [0]=197,[2]=-100,[5]=350,
    [3]=values[0] + values[5],
    [9]= values[5]/10
};

edit:

The ISO C99 standard, section 6.7.8 (Initialization) specifies that

The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject;132) all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration

But as Shafik pointed out, the evaluation order doesnt have to match the initialization order

Which means values[0] + values[5] may read garbage values from:

  • values[0]
  • values[5] (this is what happen in your case)
  • both
  • none of them
Community
  • 1
  • 1
UmNyobe
  • 22,539
  • 9
  • 61
  • 90
  • Is the behaviour unspecified, or undefined? – Bathsheba Mar 02 '15 at 15:55
  • @Bathsheba Unspecified. But still shouldn't be relied upon, since the compiler might actually behave differently from case-to-case basis when it comes to unspecified behavior. – Lundin Mar 02 '15 at 16:03
1

Try this code:

int values[10];
values[0]=197;
values[2]=-100;
values[5]=350;
values[3]=values[0]+values[5];
values[9]=values[5]/10;

And then you print the array like you've done.

sentientmachine
  • 347
  • 3
  • 14