When should one implement a simple or advanced sorting algorithm?

Question

Apart from the obvious "It's faster when there are many elements". When is it more appropriate to use a simple sorting algorithm (0(N^2)) compared to an advanced one (O(N log N))?

I've read quite a bit about for example insertion sort being preferred when you've got a small array that's nearly sorted because you get the best case N. Why is it not good to use quicksort for example, when you've got say 20 elements. Not just insertion or quick but rather when and why is a more simple algorithm useful compared to an advanced?

EDIT: If we're working with for example an array, does it matter which data input we have? Such as objects or primitive types (Integer).

With 20 elements you are not going to notice a difference in efficiency so you can use whatever algorithm you wish. Insertion sort it better for almost sorted lists because it will finish in N operations where as quick sort will always go nlogn — Mitch, Feb 19 '18 at 19:34
If it's something small like your example, a simple sort is just fine. But some server hosting contracts are based on processing hours and saving seconds off of larger queries/sorts can save $$$ in the long run. — Reed, Feb 19 '18 at 19:34
Sometimes is simpler to implement a inefficient algorithm, like sorting O(n^2) and a little bit harder to implement a O(n log n). If you don't need that much of efficiency because you have small inputs, you can prefer code that is simpler to maintain. — aeliton, Feb 19 '18 at 19:37
https://stackoverflow.com/questions/736920/is-there-ever-a-good-reason-to-use-insertion-sort — Josh Lee, Feb 19 '18 at 19:38
@aeliton I disbelieve. Can you name me a decent programming environment where there is not an efficient sort routine already implemented for your convenience? — btilly, Feb 19 '18 at 19:58
@btilly There's no decent programming environment without efficient sorting routines, as far as I know. My point is if you will implement algorithms yourself, and you know you'll only have small inputs, you can choose simplicity/maintainability over algorithm efficiency. — aeliton, Feb 20 '18 at 21:38
@aeliton And the result is that it is extremely hard to find a real use case where you should ever implement `sort` yourself. And if you find one, the situation is weird enough that generic advice is unlikely to be applicable. — btilly, Feb 20 '18 at 21:47
@btilly I think I write (at least part of) sorting routines with some frequency. For example, when I have a previously sorted list that needs to receive new elements and maintain all of them sorted. In this case I use (part of) insertion sort. But I agree with you that cases where the implementation of sorting algorithms is needed are rare. — aeliton, Feb 21 '18 at 23:41
@aeliton If possible I would still try to use a well-studied data structure that makes it easy to insert/delete/etc and keep sorted order. Like a skip list or btree. But that depends on the exact situation. — btilly, Feb 22 '18 at 02:26

score 4 · Answer 1 · answered Feb 19 '18 at 19:39

The big-oh notation captures the runtime cost of the algorithm for large values of N. It is less effective at measuring the runtime of the algorithm for small values.

The actual transition from one algorithm to another is not a trivial thing. For large N, the effects of N really dominate. For small numbers, more complex effects become very important. For example, some algorithms have better cache coherency. Others are best when you know something about the data (like your example of insertion sort when the data is nearly sorted).

The balance also changes over time. In the past, CPU speeds and memory speeds were closer together. Cache coherency issues were less of an issue. In modern times, CPU speeds have generally left memory busses behind, so cache coherency is more important.

So there's no one clear cut and dry answer to when you should use one algorithm over another. The only reliable answer is to profile your code and see.

For amusement: I was looking at the dynamic disjoint forest problem a few years back. I came across a state-of-the-art paper that permitted some operations to be done in something silly like O(log log N / log^4N). They did some truly brilliant math to get there, but there was a catch. The operations were so expensive that, for my graphs of 50-100 nodes, it was far slower than the O(n log n) solution that I eventually used. The paper's solution was far more important for people operating on graphs of 500,000+ nodes.

score 1 · Answer 2 · answered Feb 19 '18 at 19:37

1

When programming sorting algorithms, you have to take into account how much work would be put into implementing the actual algorithm vs the actual speed of it. For big O, the time to implement advanced algorithms would be outweighed by the decreased time taken to sort. For small O, such as 20-100 items, the difference is minimal, so taking a simpler route is much better.

answered Feb 19 '18 at 19:37

TMcSquared

107
14

1

The effort of calling a built-in sort routine is less work than implementing your own. You should always do the simplest efficient thing unless there is a very good specific reason not to. – btilly Feb 19 '18 at 19:56

score 1 · Answer 3 · answered Feb 19 '18 at 19:44

First of all O-Notation gives you the sense of the worst case scenario. So in case the array is nearly sorted the execution time could be near to linear time so it would be better than quick sort for example. In case the n is small enough, we do take into consideration other aspects. Algorithms such as Quick-sort can be slower because of all the recursions called. At that point it depends on how the OS handles the recursions which can end up being slower than the simple arithmetic operations required in the insertion-sort. And not to mention the additional memory space required for recursive algorithms.

score 1 · Answer 4 · answered Feb 19 '18 at 19:54

Better than 99% of the time, you should not be implementing a sorting algorithm at all.

Instead use a standard sorting algorithm from your language's standard library. In one line of code you get to use a tested and optimized implementation which is O(n log(n)). It likely implements tricks you wouldn't have thought of.

For external sorts, I've used the Unix sort utility from time to time. Aside from the non-intuitive LC_ALL=C environment variable that I need to get it to behave, it is very useful.

Any other cases where you actually need to implement your own sorting algorithm, what you implement will be driven by your precise needs. I've had to deal with this exactly once for production code in two decades of programming. (That was because for a complex series of reasons, I needed to be sorting compressed data on a machine which literally did not have enough disk space to store said data uncompressed. I used a merge sort.)

When should one implement a simple or advanced sorting algorithm?

4 Answers4