Optimizing bubble sort - What am I missing?

Question

I'm trying to understand possible optimization methods for the bubble sort algorithm. I know there are better sorting methods, but I'm just curious.

To test the efficiency I'm using std::chrono. The program sorts a 10000 number long int array 30 times and prints the average sorting time. The numbers are picked randomly(up to 10000) in every iteration. Here is the code, with no optimization:

#include <iostream>
#include <ctime>
#include <chrono>
using namespace std;



int main() {


    //bubble sort
    srand(time(NULL));

    chrono::time_point<chrono::steady_clock> start, end;

    const int n = 10000;
    int i,j, last, tests = 30,arr[n];
    long long total = 0;
    bool out;

    while (tests-->0) {


        for (i = 0; i < n; i++) {
            arr[i] = rand() % 1000;
        }


        j = n;

        start = chrono::high_resolution_clock::now();
        while(1){

            out = 0;
            for (i = 0; i < j - 1; i++) {

                if (arr[i + 1] < arr[i]) {
                    swap(arr[i + 1], arr[i]);
                    out = 1;
                }
            }

            if (!out) {
                    break;
            }

            //j--;

        }


        end = chrono::high_resolution_clock::now();

        total += chrono::duration_cast<chrono::nanoseconds>(end - start).count();
        cout << "Remaining :"<<tests << endl;

    }

    cout << "Average  :" << total / static_cast<double>(30)/1000000000<<" seconds"; // tests(30)  + nanosec -> sec


    cin.sync();
    cin.ignore();
    return 0;
}

I get 0.17 seconds average sorting time.

If I uncomment line 47(j--;) to avoid comparing numbers already sorted I get 0.12 sorting time which is understandable.

If I remember the last position where a swap took place, I know that after that index, elements are sorted, and can thus sort up to that position in further iterations. It's better explained in the second part of this post: https://stackoverflow.com/a/16196115/1967496. This is the code that implements the new possible optimization:

#include <iostream>
#include <ctime>
#include <chrono>
using namespace std;



int main() {


    //bubble sort
    srand(time(NULL));

    chrono::time_point<chrono::steady_clock> start, end;

    const int n = 10000;
    int i,j, last, tests = 30,arr[n];
    long long total = 0;
    bool out;

    while (tests-->0) {


        for (i = 0; i < n; i++) {
            arr[i] = rand() % 1000;
        }


        j = n;

        start = chrono::high_resolution_clock::now();
        while(1){

            out = 0;
            for (i = 0; i < j - 1; i++) {

                if (arr[i + 1] < arr[i]) {
                    swap(arr[i + 1], arr[i]);
                    out = 1;
                    last = i;
                }
            }

            if (!out) {
                    break;
            }

            j = last + 1;

        }


        end = chrono::high_resolution_clock::now();

        total += chrono::duration_cast<chrono::nanoseconds>(end - start).count();
        cout << "Remaining :"<<tests << endl;

    }

    cout << "Average  :" << total / static_cast<double>(30)/1000000000<<" seconds"; // tests(30)  + nanosec -> sec


    cin.sync();
    cin.ignore();
    return 0;
}

Note lines 40 and 48. And here comes the problem: The average time is now again around 0.17 seconds. Is there a problem in my code, or am I missing something ?

Update:

I did sorting with 10 times more numbers and get now following results: No optimization: 19.3 seconds First optimization(j--): 14.5 seconds Second (supposed) optimization(j=last+1): 17.4 seconds;

From my understanding, the second method should be in any case better than the first, but the numbers tell something else.

You need to run for about a tenth of a second to get a stable, representative clock time for a function. Too low and first-time cache effects and clock granularity dominate the reading. — Malcolm McLean, Oct 20 '16 at 16:55

score 1 · Accepted Answer · answered Oct 20 '16 at 16:51

1

Well... The problem is that there might not be the right or wrong answer to this question.

First of all, when you're comparing only 10000 elements, you cannot really call it an effeciency test. Try comparing much higher number of elements - maybe 500000 (although you will probably need to alocate an array dynamicaly for that).

Second of all, it might be the compiler. Compilers often try to optimize things so that the program execution will run smoother and faster.

answered Oct 20 '16 at 16:51

zemiret

195
1
9

I tried with 100000 elements, posted the results in the first post. Still get strange results. I don't think it's the compilers fault, as the logically better optimization takes longer time than the simple j-- one. – cpper Oct 20 '16 at 17:59
Just turn off the optimization on the compiler and then compare the results. – s g Oct 20 '16 at 19:44
It might just not be an optimization at all. Unless your array is almost sorted it might require more work than the first version, You have to do at least 2 additional assignments per every iteration and if your array is not almost sorted it will do more of the assignments going almost all the way to the end of the array. – zemiret Oct 20 '16 at 20:27

Optimizing bubble sort - What am I missing?

1 Answers1