why are my students learning sort? why am i teaching this? i can think of academic or theoretical reasons... basically that we are constantly ordering things - either in their own right or as part of another algorithm. how about for my students, who may never have to write their own sort function?
I'll answer the question "why do we learn how to write a sort function?" Why do we learn to write anything that's already given to us by a library? Hashes, lists, queues, trees... why learn to write any of them?
The most important is to appreciate their performance consequences and when to use which one. For example, Ruby Arrays supply a lot of built in functionality. They're so well done and easy to use that it's easy to forget you're working with a list and write yourself a pile of molasses.
Look at this loop that finds a thing in a list and replaces it.
things.each { |thing|
idx = thing.index(marker)
thing[idx] = stuff
}
With no understanding of the underlying algorithms that seems perfectly sensible.
- For each list in the list of things.
- Find the item to replace.
- Insert a new item in its place.
Two steps per thing. What could be simpler? And when they run it with a small amount of test data it's fine. When they put it into production with a real amount of data and having to do it thousands of times per second it's dog slow. Why? Without an appreciation for what all those methods are doing under the hood, they cannot know.
things.each { |thing| # O(things)
idx = thing.index(marker) # O(thing)
thing[idx] = stuff # O(1)
}
Those deceivingly simple looking Array methods are their own hidden loops. In the worst case each one must scan the whole list. Loops in loops makes this exponentially slow, it's O(n*m). How slow? If things is 1000 items long, and each thing has 1000 items in it that's... 1000 * 1000 or 1,000,000 operations!
And this isn't nearly the amount of trouble students can get into, normally they write O(n!) loops. I actually find it hard to come up with an example I'm so ingrained against it.
But that only becomes apparent after you throw a ton of data at it. While you're writing it, how can you know?
How can they make it faster? Without understanding the other options available to you and their performance characteristics, like hashes and sets and trees, they cannot know. And experienced programmer would make one immediate change to the data structure and change things
to a list of sets.
things.each { |thing| # O(things)
thing.delete(marker) # O(1)
thing.add(stuff) # O(1)
}
This is much faster. Deleting and adding with an unordered set is O(1) so it's effectively free no matter how large thing
gets. Now if things
is 1000 items long, and each thing
has 1000 items in it that's 1000 operations. By using a more appropriate data structure I just sped up that loop by 1000 times. Really what I did is changed it from O(n*m) to O(n).
Another solid example is learning how to write a solid comparison function for multi-level data. Why is the Schwartzian transform fast? You can't appreciate that without understanding how sorting works.
You could simply be told these things, sorting is O(n log n), finding something in a list is O(n), and so on... but having to do it yourself gives you a visceral appreciation for what's going on under the hood. It makes you appreciate all the work a modern language does for you.
That said, there's little point in writing six different sort algorithms, or four different trees, or five different hash conflict resolution functions. Write one of each to appreciate them, then just learn about the rest so you know they exist and when to use them. 98% of the time the exact algorithm doesn't matter, but sometimes it's good to know that a merge sort might work better than a quick sort.
Because honestly, you're never going to write your own sort function. Or tree. Or hash. Or queue. And if you do, you probably shouldn't be. Unless you intend to be the 1% that writes the underlying libraries (like I do), if you're just going to write web apps and business logic, you don't need a full blown Computer Science education. Spend that time learning Software Engineering instead: testing, requirements, estimation, readability, communications, etc...
So when a student asks "why are we learning this stuff when it's all built into the language now?" (echos of "why do I have to learn math when I have a calculator?") have them write their naive loop with their fancy methods. Shove big data at it and watch it slow to a crawl. Then write an efficient loop with good selection of data structures and algorithms and show how it screams through the data. That's their answer.
NOTE: This is the original answer before the question was understood.
Most modern languages use quicksort as their default sort, but usually modified to avoid the O(n^2) worst case. Here's the BSD man page on their implementation of qsort_r()
. Ruby uses qsort_r
.
The qsort() and qsort_r() functions are an implementation of C.A.R. Hoare's ``quicksort'' algorithm, a variant of partition-exchange sorting; in particular, see D.E. Knuth's Algorithm Q. Quicksort takes O N lg N average time. This implementation uses median selection to avoid its O N**2 worst-case behavior.
PHP also uses quicksort, though I don't know which particular implementation.
Perl uses its own implementation of quicksort by default. But you can also request a merge sort via the sort pragma.
In Perl versions 5.6 and earlier the quicksort algorithm was used to implement "sort()", but in Perl 5.8 a mergesort algorithm was also made available, mainly to guarantee worst case O(N log N) behaviour: the worst case of quicksort is O(N**2). In Perl 5.8 and later, quicksort defends against quadratic behaviour by shuffling large arrays before sorting.
Python since 2.3 uses Timsort and is guaranteed to be stable. Any software written in Python (Django) is likely to also use the default Timsort.
Javascript, really the ECMAScript specification, does not say what type of sorting algorithm to use for Array.prototype.sort. It only says that it's not guaranteed to be stable. This means the particular sorting algorithm is left to the Javascript implementation. Like Python, any Javascript frameworks such as React or Leaflet are likely to use the built in sort.
Visual Basic for Applications (VBA) comes with NO sorting algorithm. You have to write your own. This is a bizarre oversight for any language, but particularly one that's designed for business use and spreadsheets.