0

I've come across a really weird bug in a C++ program I wrote for a school assignment (code pasted at the end), and I can't figure out why it's doing what it is. Particularly, it's sometimes giving randomly incorrect output, sometimes giving correct output, being run on the same input each time. If anyone might have insight as to why, I'd be very appreciative:

I've made a C++ program that has an implementation of a simple MaxHeap data structure which supports building the heap either using HeapInsert to insert elements into the heap one by one starting with an empty heap, and alternatively starting with an array of elements and using bubbledown on the first half of the elements to transform it into a heap -- the program takes one command-line argument, either HeapInsert which will use the first method of building a heap, or BubbleDown which will build the heap using the second method.

The program takes user input from cin: first the number of elements that will be given to make the heap out of, then the elements you want to put in the heap. After its done it outputs the number of exchanges performed in the bubbleup/bubbledown, and then the elements of the heap in order that they are in the array that stores the heap.

We've been given a sample input (of 100 random numbers) and a sample output that my code should produce in order to know our implementation is correct. I do the following on the command line:

g++ HeapTest.cpp
./a.out BubbleDown < 100.txt > out
diff out s100b.txt

100.txt is the sample input, s100b.txt is the correct sample output.

Executing the lines

./a.out BubbleDown < 100.txt > out
diff out s100b.txt

repeatedly, I get inconsistent results. It seems that half the time I get that my output matches the sample file completely, but half the time it does not, and particularly when I look at my output file, it looks like a random large number has been inserted into my heap for no reason, making my output wrong.

It makes absolutely no sense to me that the results would be inconsistent in running the code repeatedly with the exact same input. This only happens when I use the "BubbleDown" option on the command line. Below is my code:

#include <cstdlib>
#include <stdint.h>       
#include <iostream>
#include <string>
#include <cstring>
#include <cassert>
#include <cmath>
using namespace std;

struct MaxHeap { //MaxHeap data structure
    int n;      //size of the heap
    int numex;  //number of exchanges in building the heap
    int* A;     //Array storing the actual heap
    MaxHeap(int a){     //First Constructor: initializes an empty heap of size 0 in an array of size a
        n=0;    //initialize size to 0
        numex=0;//initialize numex to 0
        A = new int[a]; //allocate space for array of size A on heap
    }
    MaxHeap(int * data, int a){ //Second Constructor: consumes array of a elements and creates a heap
                            //out of thoses elements using bubbledown
        n = a;
        A = data;
        numex = 0;

        for(int k = (int)(floor((n-1)/2)); k > -1 ; k-=1){
            bubbledown(k);
        }
    }
    ~MaxHeap(){}    //necessary since MaxHeaps made with first constructor are non-contiguous
    void bubbleup(int v){//bubble-up algorithm as described in class
        int j;
        while( (v != 0) && (A[(int)(floor((v-1)/2))] < A[v]) ){
            numex +=1;
            j = A[v];
            A[v] = A[(int)(floor((v-1)/2))];
            A[(int)(floor((v-1)/2))] = j;
            v = (int)(floor((v-1)/2));

        }

    }
    void bubbledown(int v){//bubbledown algorithm as described in calss

        int j;
        int k;
        int L;
        int temp;
        while(true){

            j = 2*v+1;
            k = 2*v+2;
            if((j <= n) && (A[j] > A[v])){L = j;}
            else{L = v;}
            if((k <= n) && (A[k] > A[L])){L = k;}
            if(L == v){break;}
            else{numex +=1; temp = A[v]; A[v] = A[L]; A[L] = temp; v=L;}
        }

    }
    void HeapInsert(int i, int k){//heapinsert algorithm as described in class

        n=k+1;
        A[n-1] = i;
        bubbleup(n-1);

    }
};

void error(){

    cerr << "Usage: " << endl;
    exit(-1);

}

int main(int argc, char * argv[]){

    int flag;
    char hins[] = "HeapInsert";
    char bdwn[] = "BubbleDown";

    switch(argc){    
        case 2:
            if(strcmp(argv[1], hins) == 0){flag=0; break;}
            else if(strcmp(argv[1], bdwn) == 0){flag=1; break;}
            else{error();}
        default: error();
    }

    if(flag==0){//If HeapInsert option selected, the below creates a heap via HeapInsert

        int nelem;
        cin >> nelem;       //read in number of elements that are going to be given
        struct MaxHeap H = MaxHeap(nelem);  //call first constructor

        for(int k=0; k < nelem; k+=1){      //insert elements into the heap one by one as they are read in
            int i;
            cin >> i;
            H.HeapInsert(i,k);
        }

        cout << H.numex << endl;            //print number of exchanges

        for(int k =0;k < nelem; k+=1){      //print elements of heap 1 by 1

            cout << H.A[k] << endl;

        }

    }
    else{       //if BubbleDown option chosen by user

        int nelem;
        cin >> nelem;   //read in number of elements
        int data[nelem];    //initialize array to store that number of elements

        for(int k=0; k < nelem; k+=1){  //build array of elements in order given

            int i;
            cin >> i;
            data[k] = i;

        }

        struct MaxHeap H = MaxHeap(data, nelem);    //use second constructor to create a heap out of the array

        cout << H.numex << endl;            //print number of exchanges

        for(int k =0;k < nelem; k+=1){      //print out elements 1 by 1

            cout << H.A[k] << endl;

        }

    }
}

If anyone has any idea of how my code can be producing inconsistent results like this when it doesn't rely on any randomness or memory allocation (no memory allocation used when BubbleDown option is given), insight would be greatly appreciated!

Moog
  • 10,193
  • 2
  • 40
  • 66
user990408
  • 21
  • 2
  • 5
  • 6
    `int data[nelem];` is not valid C++. Variable length arrays are not part of C++. – Mahesh Oct 11 '11 at 22:46
  • 3
    Looks like it's time to pull out `valgrind`. – Billy ONeal Oct 11 '11 at 22:50
  • You don't need to say `struct` in C++, and you should use direct initialization rather than copy initialization: `MaxHeap H(nelem);` (This isn't related to your problem, just to your learning C++.) Also, you could use `std::string`s, which you can compare directly. – Kerrek SB Oct 11 '11 at 22:50
  • What's the point of `n` in `HeapInsert()`? – Kerrek SB Oct 11 '11 at 22:53
  • Some of the comments indicate great confusion: `~MaxHeap(){} //necessary since MaxHeaps made with first constructor are non-contiguous` Why don't you forget about the class for the time being and implement this as a free function. ` – UncleBens Oct 11 '11 at 22:55
  • @BillyONeal: did that. Is not the problem. Just the algorithm is borked (it's getting bogus sequences out with some input repeated many times) – sehe Oct 11 '11 at 22:55
  • Just a remark: it's very good that you use comments, but try to explain why, not what. In your case good variable names would be more helpful. – ruslik Oct 11 '11 at 22:56
  • @Billy ONeal - My thoughts exactly ;) Would've answered this faster but I couldn't remember how to get line numbers :D – Brendan Long Oct 11 '11 at 23:16
  • @sehe: It seemed useful for generating Brendan's answer. – Billy ONeal Oct 12 '11 at 02:21
  • @BillyONeal: shame on me - I overlooked it! I wasn't aware that valgrind's `--db-attach=yes` prompt WON'T pause execution trigger while input is being redirected. So I learned a valuable lesson here. Thanks for drawing my attention! – sehe Oct 12 '11 at 07:23

1 Answers1

8

I compiled your program with debug symbols...

gcc -g -O0 -o stuff stuff.cpp

And ran it in Valgrind...

echo '4 2 3 4 5 6' | valgrind ./stuff BubbleDown

Here's what it said:

==28605== Conditional jump or move depends on uninitialised value(s)
==28605==    at 0x401186: MaxHeap::bubbledown(int) (stuff.cpp:52)
==28605==    by 0x400FCD: MaxHeap::MaxHeap(int*, int) (stuff.cpp:26)
==28605==    by 0x400E08: main (stuff.cpp:125)

Which seems to correspond to this:

if((j <= n) && (A[j] > A[v])){L = j;}

The problem seems to be that you're reading off the end of the array. If j == n, then it's one element past the end of the array. Same with k == n. If you change bubbledown to this, the problem goes away:

void bubbledown(int v){//bubbledown algorithm as described in calss
    while(true){
        const int j = 2*v+1;
        const int k = 2*v+2;
        int L;

        // notice < instead of <=
        if((j < n) && (A[j] > A[v])){
            L = j;
        }
        else{
            L = v;
        }

        // notice < instead of <=
        if((k < n) && (A[k] > A[L])){
            L = k;
        }
        if(L == v){
            break;
        }
        else{
            numex +=1;
            const int temp = A[v];
            A[v] = A[L];
            A[L] = temp;
            v = L;
        }
    }
}

Note: I used some Linux commands to do this (Valgrind most importantly). Whatever compiler toolchain / IDE you're using should have its own debugger which can presumably give you similar output. There's a Stack Overflow question about Valgrind substitutes in Windows. I suggest finding a tool like that you like -- It will make C++ debugging much easier.

Community
  • 1
  • 1
Brendan Long
  • 53,280
  • 21
  • 146
  • 188
  • thank you immensely. I tried valgrind for the first time and wasn't sure where to find what was uninitialized. What I'm more curious about, though, is why this doesn't always cause a problem when running on the exact same input? Going off the end of the array should always cause a problem, no? – user990408 Oct 11 '11 at 23:42
  • @user990408 - To make valgrind show line numbers, compile with debug symbols (`-g`) and no optimization (`-O0`). C++ itself doesn't care if you go off the end of the array. There's a good explanation in [this question](http://stackoverflow.com/questions/671703/array-index-out-of-bound-in-c). In your case it won't crash because the memory belongs to you (you're reading data right after `data`'s location on the stack, which is memory you own), but you don't know what that data is (so it appears to act randomly). – Brendan Long Oct 11 '11 at 23:53
  • There are some languages that try to make this work better, like Java, which will throw an exception if you try to read off the end of an array or use an uninitialized variable, but C++ is designed to be "close to the metal", so it's not able to check things like that for you. – Brendan Long Oct 11 '11 at 23:55