12

One of my friends sent this code to me, saying it doesn't work as expected:

#include<stdio.h>

void main()
{
    int a [10] ={23, 100, 20, 30, 25, 45, 40, 55, 43, 42};
    int sizeOfInput = sizeof(a)/sizeof(int);
    int b, outer, inner, c;
    printf("Size is : %d \n", sizeOfInput);

    printf("Values before bubble sort are  : \n");
    for ( b = 0; b &lt; sizeOfInput; b++)
        printf("%d\n", a[b]);

    printf("End of values before bubble sort... \n");

    for ( outer = sizeOfInput; outer > 0; outer-- )
    {
        for (  inner = 0 ; inner < outer ; inner++)
        {
        printf ( "Comparing positions: %d and %d\n",inner,inner+1);
        if ( a[inner] > a[inner + 1] )
        {
                int tmp = a[inner];
                a[inner] = a [inner+1];
            a[inner+1] = tmp;
            }
        }
        printf ( "Bubble sort total array size after inner loop is %d :\n",sizeOfInput);
        printf ( "Bubble sort sizeOfInput after inner loop is %d :\n",sizeOfInput);
    }
    printf ( "Bubble sort total array size at the end is %d :\n",sizeOfInput);
    for ( c = 0 ; c < sizeOfInput; c++)
        printf("Element: %d\n", a[c]);
}

I am using Micosoft Visual Studio Command Line Tool for compiling this on a Windows XP machine. cl /EHsc bubblesort01.c

My friend gets the correct output on a dinosaur machine (code is compiled using TCC there).
My output is unexpected. The array mysteriously grows in size, in between.

If you change the code so that the variable sizeOfInput is changed to sizeOfInputt, it gives the expected results!

A search done at Microsoft Visual C++ Developer Center doesn't give any results for "sizeOfInput".

I am not a C/C++ expert, and am curious to find out why this happens - any C/C++ experts who can shed some light on this?
Unrelated note: I seriously thought of rewriting the whole code to use quicksort or merge sort before posting it here. But, after all, it is not Stooge sort...

Edit: I know the code is not correct (it reads beyond the last element), but I am curious why the variable name makes a difference.

Sujith Surendranathan
  • 2,569
  • 17
  • 21

3 Answers3

29

Like interjay's answer mentioned, once you get into undefined behavior all bets are off. However, when you said that merely renaming the variable changed the behavior of the program, I got curious about what was going on (undefined or not).

First, I didn't believe that renaming the variable would change the compiler's output (all other things being equal), but sure enough - when I tried it, I was surprised to see exactly what you described.

So I had the compiler dump the assembly for the code it was generating for each version of the source file, and ran a comparison. Here's what I found in the compilers description of how it was laying out the local variables:

    ***** test.sizeOfInput.cod
    _c$ = -56                    ; size = 4
    _b$ = -52                    ; size = 4
    _inner$ = -48                ; size = 4
    _a$ = -44                    ; size = 40
>>> _sizeOfInput$ = -4           ; size = 4
    _main   PROC
    ***** test.sizeOfInputt.cod
    _c$ = -56                    ; size = 4
>>> _sizeOfInputt$ = -52         ; size = 4
    _b$ = -48                    ; size = 4
    _inner$ = -44                ; size = 4
    _a$ = -40                    ; size = 40
    _main   PROC
    *****

What you'll notice is that when the variable is named sizeOfInput, he compiler places it at a higher address than array a (just past the end of the array), and when the variable is named sizeOfInputt it places it at a lower address than array a instead of just past the end of the array. That means that in the build that has the variable named sizeOfInput, the undefined behavior that occurs when you modify a[10] is changing the value of sizeOfInput. In the build that uses the name sizeOfInputt, since that variable isn't at the end of the array, the write to a[10] trashes something else.

As to why the compiler would lay out the variables differently when the name of one changes in an apparently insignificant way - I have no idea.

But this is a good example of why you shouldn't count on the layout of local variables (or pretty much any variables, though you can count on the layout order of struct elements), and why when it comes to undefined behavior, "it works on my machine" doesn't cut it as proof that something works.

Community
  • 1
  • 1
Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • Michael, this helps! This is what I was looking for. Accepting your answer. Thanks for taking the time to research this. – Sujith Surendranathan Jan 02 '11 at 15:21
  • 8
    Probably all variables are put into some form of hash map and the hashes of `sizeOfInput` and `sizeOfInputt` are simply different. – D.R. Mar 19 '14 at 15:50
7

Your code reads past the end of the array. The maximum value of outer is 10, and the maximum value of inner is 9, so a[inner+1] will read a[10]. This will give you undefined behavior, which explains why different compilers give different results.

As for the variable name making a difference: It probably doesn't. It's possible that if you run the same code twice (using the same variable name), you will get different results. Basically, when invoking undefined behavior, you can't be sure of anything that your program will do, so it's best not to try and find meaning in things like variable names.

There's also a chance that the variable name does make a difference. That depends on the implementation of the compiler: For example, using a different variable name might make the compiler organize memory differently somehow, which could cause the program to bahave differently. I think most compilers would output the same code if you change variable names though, so it was probably just a matter of luck.

interjay
  • 107,303
  • 21
  • 270
  • 254
  • interjay: Thanks, I updated my post. I know the code reads one more element - it is obvious from the output. I am puzzled about the name of the variable making such a difference. – Sujith Surendranathan Jan 01 '11 at 20:57
  • @interjay: Saw your update: no, if I name the variable as something else, it compiles and gives me the right answer the very first time itself. Also, recompiling with "sizeOfInput" didn't make any difference. – Sujith Surendranathan Jan 01 '11 at 21:05
  • @Sujith: Even if that's true, it's nearly impossible to tell for sure why it happens without examining the source code of the compiler. I made another edit to give an example. – interjay Jan 01 '11 at 21:10
  • @interjay: Thanks for the update. As I mentioned in my post, it works with Turbo C compiler, with ANY variable name. With MS Visual Studio also, it works with ANY variable name, except "sizeOfInput". It must be because of the way Microsoft implemented their compiler, but I was wondering whether any C/C++ experts have faced this same behavior before. – Sujith Surendranathan Jan 01 '11 at 21:17
  • @Sujith: C++ experts will just avoid the undefined behavior first, and then there won't be such problems :) interjay gave a good explanation of what might be happening – Roman L Jan 01 '11 at 21:49
1

Michael Burr's reply exposes an interesting behavior of the compiler. But I still doubt the local variable name could affect its layout in the stack for the function's active record.

I am using VC++ 2013 with command line above (cl.exe /EHsc progam.cpp) and cannot repro it - changing variable name doesn't change the program behavior, instead, it shows random crash stably (some runs returns the results well and some runs crash).

The reason for the above random crash is __security_cookie is stored directly above (larger address) the array a, and as a is defined as singed integer array, the result will depend on the sign bit (mis-interpreted) of the value of __security_cookie. If it is a positive integer larger than 100, it is still the largest value in the array a, so sort will not switch it to other positions, then the check (__security_check_cookie) at the end of the function will be fine. If it is less than 100 or negative integer, it will be switched to lower element in the array after sort so __security_check_cookie reports failure. This means the program behavior depends on the random generated value for __security_cookie.

I changed the original test program to below to ease testing. I also extended the output to include the off-by-one element (arrayLen + 1), and we could predict the behavior based on the original value in the element after the defined array a.

#include<stdio.h>
#define arrayLen sizeOfInputt

void main()
{
    int a [10] ={23, 100, 20, 30, 25, 45, 40, 55, 43, 42};
    int arrayLen = sizeof(a)/sizeof(int);
    int b, outer, inner, c;
    printf("Size is : %d \n", arrayLen);

    printf("Values before bubble sort are  : \n");
    for ( b = 0; b < arrayLen + 1; b++)
        printf("%d\n", a[b]);

    printf("End of values before bubble sort... \n");

    for ( outer = arrayLen; outer > 0; outer-- )
    {
        for (  inner = 0 ; inner < outer ; inner++)
        {
        printf ( "Comparing positions: %d and %d\n",inner,inner+1);
        if ( a[inner] > a[inner + 1] )
        {
                int tmp = a[inner];
                a[inner] = a [inner+1];
            a[inner+1] = tmp;
            }
        }
        printf ( "Bubble sort total array size after inner loop is %d :\n",arrayLen);
        printf ( "Bubble sort arrayLen after inner loop is %d :\n",arrayLen);
    }
    printf ( "Bubble sort total array size at the end is %d :\n",arrayLen);
    for ( c = 0 ; c < arrayLen; c++)
        printf("Element: %d\n", a[c]);
}
Community
  • 1
  • 1
Thomson
  • 20,586
  • 28
  • 90
  • 134