41

Which is the best sorting technique to sort the following array and if there are duplicates how to handle them:

int a= {1,3,6,7,1,2};

Also which is the best sorting technique of all?

void BubbleSort(int a[], int array_size)
{
    int i, j, temp;
    for (i = 0; i < (array_size - 1); ++i)
    {
        for (j = 0; j < array_size - 1 - i; ++j )
        {
            if (a[j] > a[j+1])
            {
                temp = a[j+1];
                a[j+1] = a[j];
                a[j] = temp;
            }
        }
    }
}
jww
  • 97,681
  • 90
  • 411
  • 885
Rajeev
  • 44,985
  • 76
  • 186
  • 285
  • 1
    See: http://en.wikipedia.org/wiki/Sorting_algorithm – Donotalo Oct 08 '10 at 20:13
  • 2
    There is no "best sorting technique of all", it depends on the size of your data and if it is somewhat sorted at the beginning. I'd suggest you to read http://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_of_algorithms and the whole Wikipedia article as well. – schnaader Oct 08 '10 at 20:13
  • "best" depends on the data and other constraints: memory, speed, how mis sorted to start. quicksort is a great compromise among those. bubble sort is a best for small memory. What do you want to accomplish? – dawg Oct 08 '10 at 20:16
  • The best (if best == fastest) sorting technique would be to get the data such that it's already sorted. – Nick T Oct 08 '10 at 20:17
  • "following array" = "preceding array"? If yes, the fastest is to write it down sorted. Seriously, I do this in generated code. – Peter G. Oct 08 '10 at 20:41
  • @drewk: Bubble Sort is *not* a best for small memory. It's only good property is that it is an inplace sort, but that's also true for quick sort, heap sort and others and all of those are O(n.log(n)) instead of O(n^2) like bubble sort. – kriss Oct 08 '10 at 20:45
  • @kriss: Granted and agreed. I misspoke about smallest memory and I meant "simplest." Bubble may be the simplest code or concept. Some sort algorithms can get PhD level esoteric and hard to understand. Some can get unstable or crash on edge cases. Bubble sort is probably *not* the smallest and certainly not swiftest, but I bet my 8 year old could understand it! – dawg Oct 08 '10 at 21:56
  • @drewk: I'm not so sure, you should try and see. Some algorithms are indeed esoteric, but others are really intuitive. I happened to explain quick sort to my daughter of 6 (was sorting a stack of magazines by issue number) and she understood it perfectly. With Bubble Sort it's difficult to avoid the in place thing and I first would have to explain her computer arrays... by the way, radix sort is even simpler to explain to non computer peoples. Just call it "postman sort" and say things like "make a stack for each year" and later, "in each year stack make a stack for month". – kriss Oct 09 '10 at 02:27
  • @drewk: Bubble sort is not "best for small memory". It's slow as hell and heap sort uses the same amount of memory (i.e. it's in-place) while running very fast (i.e. `O(n log n)`). – R.. GitHub STOP HELPING ICE Oct 09 '10 at 03:17

5 Answers5

65

In C, you can use the built in qsort command:

int compare( const void* a, const void* b)
{
     int int_a = * ( (int*) a );
     int int_b = * ( (int*) b );

     if ( int_a == int_b ) return 0;
     else if ( int_a < int_b ) return -1;
     else return 1;
}

qsort( a, 6, sizeof(int), compare )

see: http://www.cplusplus.com/reference/clibrary/cstdlib/qsort/


To answer the second part of your question: an optimal (comparison based) sorting algorithm is one that runs with O(n log(n)) comparisons. There are several that have this property (including quick sort, merge sort, heap sort, etc.), but which one to use depends on your use case.

As a side note, you can sometime do better than O(n log(n)) if you know something about your data - see the wikipedia article on Radix Sort

Alex Reece
  • 1,906
  • 22
  • 31
  • 5
    @Alex: if you want it fast, at least provide a decent compare function! qsort does not need the returned values to be -1, 0, 1, but "any negative number", 0, "any positive number", hence you just have to do `return *((int*)a)-*((int*)b);` which is much faster than your proposal. – kriss Oct 08 '10 at 21:02
  • 10
    @kriss: your comparison isn't well-defined in case of integer overflow; therefore, one often sees things like `return (a > b) - (a < b)` – Christoph Oct 08 '10 at 22:04
  • @kriss: except that comparison function doesn't (necessarily) work. What happens if `a` is `INT_MAX` and `b` is `-1`, for example? – Stephen Canon Oct 08 '10 at 22:04
  • 2
    @Stephen Canon: Agreed, you should use formulas like Christoph's when you know nothing of your data range and overflow can happen. In practical case I never saw a single occurence when dealing with signed numbers dans I didn't had some rough idea of the data range (and my formulas is also fine for unsigneds). My point was mostly that compare API result type is not -1,0,1 (or we couldn't even use strcmp for comparing char*). – kriss Oct 09 '10 at 02:34
  • Quick sort does not run in `O(n log n)` time. It runs in `O(n^2)` time. People who claim otherwise need to look up what big O means. – R.. GitHub STOP HELPING ICE Oct 09 '10 at 03:18
  • @kriss: returning the difference **does not work** due to integer overflow issues! Unless you're adept at avoiding overflows, you'd better just use a conditional and return -1/0/1, which is much less error-prone. – R.. GitHub STOP HELPING ICE Oct 09 '10 at 03:20
  • One thing to note: the take-the-difference approach **does work** whenever the data type is strictly smaller in range than `int`. – R.. GitHub STOP HELPING ICE Oct 09 '10 at 03:27
  • @R..: Yes, it works is domain of value is small enough and that is often the case. And for unsigned it works provided both unsigned are smaller than INT_MAX (not UINT_MAX). Usually I'm using C programming language when I want (real) speed or bit level control. If problem lies elsewhere (complexity of algorithms or such) python is usually my prefered choice. But OK, when doing such things you'd better know exactly what you are doing. – kriss Oct 09 '10 at 15:44
  • 1
    @R..: big O does not implies worst case, QuickSort is O(n log n) on average and naive implementation can be changed it such a way it behave the same in worst case (just need a pivot choice from more values). Change is small enough that the modified version is usually still called QuickSort. – kriss Oct 09 '10 at 15:49
  • @R..: with the **naive implementation** of QuickSort when you always take say the first item as pivot and when you are looking for the worst case complexity it is indeed O(n^2). But with randomized data it is proved that QuickSort's complexity is O(n.log(n)) on average (and that's probably what people claiming that QuickSort is O(n.log(n)) are claiming). Big O notation does not implies you are speking of worst case. Second, very minor changes to QuickSort (so minor you still call it QuickSort) can make it O(n.log(n)) in worst case). – kriss Oct 09 '10 at 08:01
  • @R...: you probably haven't read my comment above. With real data you usually work in a domain much smaller than int domain. But when programming with C people just call their types int, even if they are not really ints and that data can never goes out of domain. In many cases domain restrictions guarantee you'll never get an overflow. In these case (indeed restricted, but with an acceptable restriction) the method works fine. Another similar typical case is with unsigneds if you known your input values are in range 0..INT_MAX (not UINT_MAX). – kriss Oct 09 '10 at 08:11
  • 3
    @kriss: This use of notation is simply wrong. Even if it's randomized, it **can** hit cases where it takes quadratic time. Therefore, the **big O is quadratic**. Big O **always** means **worst case**. Use different notations for ridiculous "average case" complexity estimates. – R.. GitHub STOP HELPING ICE Oct 09 '10 at 09:45
  • 3
    @kriss: If I say an algorithm is `O(f(n))` in time, that means the time it takes to run is **bounded by a constant multiple of `f(n)`**, where the particular constant is implementation-dependent but constant within an implementation, for **all possible inputs**. Claiming quicksort is `O(n log n)` is as absurd as claiming `if (rand()==42) return find_prime_factors(n); else return NULL;` is `O(1)` with respect to `n`. – R.. GitHub STOP HELPING ICE Oct 09 '10 at 22:10
  • @R. Actually, there is a version of quicksort that is guaranteed to run in O(n log n ) - use quickselect and median of 5 to find the true median in O( n ) and then recurse on the appropriate halves. `T(n) = n + 2T(n/2) = O( n log n )` – Alex Reece Oct 09 '10 at 23:37
  • @R..: are you confusing data and size of data ? Big-O notation parameter is size of data, but what is averaged is data. Have a look at http://en.wikipedia.org/wiki/Average_case_analysis#Worst-case_versus_average-case_performance. What I remember from my Univerity time is as follow: consider all possible inputs to function f(). Some (T1) have a `k1*n*log(n)` runtime, some others (T2) degenerate and are `k2*n^2`. Let r(n) be number of T2 cases. `average O of f(n)` is `O(((n-r(n))*k2*n^2 + r(n)*k1*n*log(n))/n)` Say number of degenerating case is log(n) or less then average `O(f(n)) = O(log(n)*n)` – kriss Oct 10 '10 at 21:21
  • 1
    @Alex: okay, I found the reference on Wikipedia, and apparently a variant of quicksort can be made to run in `O(n log n)` time. I would argue that this is a sufficiently more advanced algorithm that it's not fare to equate it with quicksort, but the pivot principle is the same. – R.. GitHub STOP HELPING ICE Oct 10 '10 at 23:28
  • 2
    @kriss: averaging is **absolutely irrelevant**. Big O is a matter of bounding and has nothing to do with average performance. My example with `rand()` was that it's easy to write a function where average performance is fast but worst-case is arbitrarily slow. As Alex has pointed out, it's apparently possible to make a variant of quicksort that runs in `O(n log n)` time, but your use of big O terminology is still incorrect. – R.. GitHub STOP HELPING ICE Oct 10 '10 at 23:31
  • @R..: Well, space is a bit short in comments to give you the maths, I thought my above exemple was enough, but seems not to be (did you read it anyway ?). I guess I'll have to open a question on the subject. I you are curious You can find full math and details on many sites. Big O is often used as "worst case" because if your worst case has a good complexity you'll have a low run-time. But if you work on criptanalysis what you'll be looking for is "best-case" because you want your code to be hard to break for any input. Average case is in between and big O is applied **after** averaging. – kriss Oct 11 '10 at 08:25
  • +1, I would edit to replace the link to an archived version but the edit review queue is full.., there it's https://web.archive.org/web/20220119112150/http://www.cplusplus.com/reference/cstdlib/qsort/ if someone can edit. – yagmoth555 Oct 13 '22 at 19:04
13

In your particular case the fastest sort is probably the one described in this answer. It is exactly optimized for an array of 6 ints and uses sorting networks. It is 20 times (measured on x86) faster than library qsort. Sorting networks are optimal for sort of fixed length arrays. As they are a fixed sequence of instructions they can even be implemented easily by hardware.

Generally speaking there is many sorting algorithms optimized for some specialized case. The general purpose algorithms like heap sort or quick sort are optimized for in place sorting of an array of items. They yield a complexity of O(n.log(n)), n being the number of items to sort.

The library function qsort() is very well coded and efficient in terms of complexity, but uses a call to some comparizon function provided by user, and this call has a quite high cost.

For sorting very large amount of datas algorithms have also to take care of swapping of data to and from disk, this is the kind of sorts implemented in databases and your best bet if you have such needs is to put datas in some database and use the built in sort.

Community
  • 1
  • 1
kriss
  • 23,497
  • 17
  • 97
  • 116
7

I'd like to make some changes: In C, you can use the built in qsort command:

int compare( const void* a, const void* b)
{
   int int_a = * ( (int*) a );
   int int_b = * ( (int*) b );

   // an easy expression for comparing
   return (int_a > int_b) - (int_a < int_b);
}

qsort( a, 6, sizeof(int), compare )
Thomas
  • 859
  • 1
  • 6
  • 7
6

Depends

It depends on various things. But in general algorithms using a Divide-and-Conquer / dichotomic approach will perform well for sorting problems as they present interesting average-case complexities.

Basics

To understand which algorithms work best, you will need basic knowledge of algorithms complexity and big-O notation, so you can understand how they rate in terms of average case, best case and worst case scenarios. If required, you'd also have to pay attention to the sorting algorithm's stability.

For instance, usually an efficient algorithm is quicksort. However, if you give quicksort a perfectly inverted list, then it will perform poorly (a simple selection sort will perform better in that case!). Shell-sort would also usually be a good complement to quicksort if you perform a pre-analysis of your list.

Have a look at the following, for "advanced searches" using divide and conquer approaches:

And these more straighforward algorithms for less complex ones:

Further

The above are the usual suspects when getting started, but there are countless others.

As pointed out by R. in the comments and by kriss in his answer, you may want to have a look at HeapSort, which provides a theoretically better sorting complexity than a quicksort (but will won't often fare better in practical settings). There are also variants and hybrid algorithms (e.g. TimSort).

haylem
  • 22,460
  • 3
  • 67
  • 96
  • If you provide a perfectly inverted list to quicksort it will degenerate only in the most naive implementation (allways take head of the list as pivot) and even then it won't be worse that BubbleSort. The naive Quicksort would also perform poorly with an already sorted list. But very simple changes to the algorithm are enough to avoid the problem (extract several numbers from the list as potential pivot and choose median as pivot). – kriss Oct 08 '10 at 20:56
  • @kriss: Correct. But this is a CS-learning question, and so I just talk about the theoretical and basic implementation of each of these approaches. Obviously you can tweak algorithms and minimize these side-effects, but as the OP is asking about general sorting issues, I think it's more in-line to pinpoint these issues. – haylem Oct 08 '10 at 23:33
  • @haylem: it's indeed probably a learning question, but the risk speaking about naive implementations is for the reader to believe that the library call qsort is a naive implementation of QuickSort, which it is not, and would degenerate on sorted data set. If I remember correctly it is not even a QuickSort in most implementations. – kriss Oct 09 '10 at 02:41
  • You left out **heap sort**, which is quite arguably the ideal sort (`O(1)` space and `O(n log n)` time). – R.. GitHub STOP HELPING ICE Oct 09 '10 at 03:24
  • @R.: I left out many of them I guess :) But you're right, I should have mentioned heap-sort. – haylem Oct 09 '10 at 09:59
3

The best sorting technique of all generally depends upon the size of an array. Merge sort can be the best of all as it manages better space and time complexity according to the Big-O algorithm (This suits better for a large array).

Pankti
  • 411
  • 4
  • 13