-4

After providing an answer to a question here, I was testing this code that I edited and noticed some strange behavior:

#include <iostream>
#define MAX 100
using namespace std;

int main()
{
    int size = 0;
    int array[MAX];
    int i, j;
    int input;

    cout << "Array: ";
    for(i = 0; i < MAX; i++)
    {
        cin >> input;

        if(input == -1)
            break;
        else
        {
            array[i] = input;
            size++;
        }
    }

    cout << "Size: " << size << "\n\n";


    int left[size / 2];
    int right[size / 2];

    for(i = 0; i < size / 2; i++)
        left[i] = array[i];
    for(i = size / 2, j = 0; i < size; i++, j++)
        right[j] = array[i];

    cout << "Left: ";
    for(i = 0; i < size / 2; i++)
        cout << left[i] << ' ';
    cout << '\n';

    cout << "Right: ";
    for(i = 0; i < size - size / 2; i++)
        cout << right[i] << ' ';
    cout << '\n';

    return 0;
}

This code is supposed to split the array into two separate arrays. Somehow the output is wrong when these are the input:

1 2 3 4 5 6 7 8 9 -1
Left: 9  2  3  4
Right: 5  6  7  8  9

After debugging If the elements of left were printed like this:

for(i = size / 2, j = 0; i < size; i++, j++)
{
    right[j] = array[i];
    cout << left[0] << ' ';
}
cout << '\n';

It says that the value of left[0] is modified after the 5th iteration:

1 1 1 1 9
Left: 9 2 3 4
Right: 5 6 7 8 9

This only happens when the array size is 9. I haven't tested beyond 16 yet. I could fix the code so that it would have the correct size

int right[size - size / 2];

or use malloc() to adhere to the C++ Standard,

int *left = (int *) malloc(sizeof(*left) * n / 2);
int *right = (int *) malloc(sizeof(*left) * n / 2);

so that left wouldn't be affected, but that's not what I'm asking. Why does it only happen when splitting an array size of 9? Why was left[0] overwritten? Is this is a bug in g++ that should be reported or is the problem something else?

AcidResin
  • 134
  • 2
  • 12
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/232407/discussion-on-question-by-acidresin-why-are-variable-length-arrays-in-c-overla). – Samuel Liew May 14 '21 at 13:31

1 Answers1

3

It says that the value of left[0] is modified after the 5th iteration:

That is your answer. The problem occurs in the fifth iteration over an array with four elements.

When size is odd, the calculation of size/2 rounds down. So the sum size/2 + size/2 is strictly less than size, yet your loops ensure that all size elements from the original array are assigned somewhere. Something has to be assigned to an unexpected location. We call this "undefined behavior", and whatever the compiler does at that point is correct according to the C++ standard. (Whatever happens, the compiler gets to blame your code for it.) It just happens that when size is 9, the compiler used left[0] as the location for right[4].

Behind the scenes, the left and right arrays are probably more-or-less adjacent in memory. The layout would have right[0] through right[size/2], then possibly some unused space (also known as "padding"), then left[0] through left[size/2]. When you access one-past the last element of right, you end up either in the unused space or in left[0]. When you overwrite the unused space, you see no symptoms since that space is otherwise unused. However, when you overwrite left[0] you definitely see a symptom.

Your compiler apparently uses padding to make sure the arrays are aligned to 4*sizeof(int). (Must be faster that way, as compilers rarely introduce waste without a reason. Still, I am surprised it's not 2*sizeof(int) instead.) That is, there is no padding when size/2 is a multiple of 4. If this guesswork is accurate, you should see this behavior when size is odd and size/2 is a multiple of 4; that is when size is one more than a multiple of 8, as in 9, 17, 25, 33, etc.

JaMiT
  • 14,422
  • 4
  • 15
  • 31
  • Explanation: `for(i = size / 2, j = 0; i < size; i++, j++)` in the case of `size` == 9 resolves to `for(i = 4, j = 0; i < 9; i++, j++)`. This will iterate `i` over 4, 5, 6, 7, and 8 for a total of five iteration. This means `j` ranges 0 to 4, and there is no `right[4]` – user4581301 May 12 '21 at 18:07
  • @user4581301 So `right[4]` can be anywhere, and it was by coincidence in `left[0]` when array size is 9? – AcidResin May 12 '21 at 18:08
  • 1
    Exactly. If the system's automatic storage is stack-based, and most are, when you write to `right[4]` you write over (AKA "smash") some variable around `right` in the stack. – user4581301 May 12 '21 at 18:10
  • @user4581301 But why does it not happen when array size is 11? – AcidResin May 12 '21 at 18:12
  • 1
    Hard to say, but probably is the result of padding. For performance reasons variables are often aligned to make them easier and faster to access. If the arrays are 5 elements long and the compiler aligned the data to 64 bits there may be an empty 4 bytes between `right` and `left`. And maybe not. There's not much value in reasoning about undefined behaviour specifically because it's undefined. Whatever you learn may only apply to that compiler on that CPU building for that OS. – user4581301 May 12 '21 at 18:17
  • I just tried, it did overwrite when array size is 17. – AcidResin May 12 '21 at 18:20
  • 1
    Side note: If you are writing code for exactly one OS/compiler/hardware platform, you can abuse the living hell out of UB. You will often see some seriously dodgy code in the C++ Standard Library implementations shipped with a compiler because the library was written specifically for that compiler and the writers are intimately familiar with the compiler and know when they can get away with a wee bit of UB in order to reap huge performance gains. Another use for UB is inserting debug code. Visual Studio adds range checks to their debug libraries so you can sometimes catch bad behaviour early. – user4581301 May 12 '21 at 18:24
  • Please mention why `malloc()` still worked even though the size was wrong. – AcidResin May 13 '21 at 15:44
  • @AcidResin You specifically stated in the question that you are not asking about your use of `malloc()` (which is good, since there should be only one question per question). Describing the `malloc()` approach in an answer would be off-topic. – JaMiT May 13 '21 at 22:05
  • @AcidResin Besides which, the explanation for why undefined behavior "worked" is almost always that, as luck would have it, nothing bad happened in that particular scenario. [Undefined behavior is not required to behave badly.](https://stackoverflow.com/questions/6441218/can-a-local-variables-memory-be-accessed-outside-its-scope/6445794#6445794) – JaMiT May 13 '21 at 22:16
  • No, I did not say that. What I meant was I can fix the code to use the correct size (or use `malloc()`) but that's wasn't the point of my question. And I think it's still related though. `malloc()` pads the variables differently that's why it was still working correctly. – AcidResin May 14 '21 at 01:48
  • @AcidResin Read your question again. You wrote "that's not what I'm asking", with "that" referring to the only part of your question where `malloc` appears. If you meant something different, you should have [written what you meant](https://www.goodreads.com/quotes/7921298-then-you-should-say-what-you-mean-the-march-hare). At this point (with an answer already provided), you are stuck with what you wrote, even if it is not what you meant. You could try a new question asking about `malloc`, but there is a high chance of it being poorly received for the reason in my earlier comment. – JaMiT May 14 '21 at 05:18