0

I am a TA for algorithms class and we are doing a unit on sorting, and I wanted to a discussion of quicksort. There are many good theoretical discussions of sorting methods on the web showing which one is better in which circumstances...

What are some real-life instances of quick-sort I can give to my student. Especially in the field of Web Development.

  • does Django use quick-sort?
  • does React?
  • does Leaflet use any kind of sort?

In fact, I don't really care about quicksort particularly. Any sorting method will do if I can point to a specific library that uses it. Thanks.

Schwern
  • 153,029
  • 25
  • 195
  • 336
john mangual
  • 7,718
  • 13
  • 56
  • 95
  • 1
    I'm not entirely certain what you're asking for. Most programming languages provide a sort function for lists. Most libraries use that function, they don't write their own. – Schwern Sep 29 '16 at 02:54
  • basically, why are my students learning sort? why am i teaching this? i can think of academic or theoretical reasons... basically that we are constantly ordering things - either in their own right or as part of another algorithm. how about for my students, who may never have to write their own sort function? @schwern – john mangual Sep 29 '16 at 04:11
  • 1
    Oh! "Why should one learn how to write various sort functions when they're already built in?" Something like that? – Schwern Sep 29 '16 at 04:15
  • I have updated [my answer](http://stackoverflow.com/a/39760602/14660) to cover what you really want to know. – Schwern Sep 29 '16 at 16:53

3 Answers3

1

Almost any table is sorted. Most web apps are backed by SQL database and the actual sorting is performed inside that SQL database. For example SQL query SELECT id, date, total FROM orders ORDER BY date DESC. This kind of sorting uses already sorted database indexes, which are mostly implemented using B-trees (or data structures inspired by B-trees). But if data needs to be sorted on the fly then I think quicksort is usually used.

Sorting, merging of sorted files and binary search in sorted files is often used in big data processing, analytics, ad dispatching, fulltext search... Even Google results are sorted :)

Sometimes you don't need sort, but partial sort, or min-heap. For example in Dijkstra's algorithm for finding shortest path. Which is used (or can be used, or I would use it :) ) for example in route planning (Google Maps).

Messa
  • 24,321
  • 6
  • 68
  • 92
1

why are my students learning sort? why am i teaching this? i can think of academic or theoretical reasons... basically that we are constantly ordering things - either in their own right or as part of another algorithm. how about for my students, who may never have to write their own sort function?

I'll answer the question "why do we learn how to write a sort function?" Why do we learn to write anything that's already given to us by a library? Hashes, lists, queues, trees... why learn to write any of them?

The most important is to appreciate their performance consequences and when to use which one. For example, Ruby Arrays supply a lot of built in functionality. They're so well done and easy to use that it's easy to forget you're working with a list and write yourself a pile of molasses.

Look at this loop that finds a thing in a list and replaces it.

things.each { |thing|
    idx = thing.index(marker)
    thing[idx] = stuff
}

With no understanding of the underlying algorithms that seems perfectly sensible.

  • For each list in the list of things.
    1. Find the item to replace.
    2. Insert a new item in its place.

Two steps per thing. What could be simpler? And when they run it with a small amount of test data it's fine. When they put it into production with a real amount of data and having to do it thousands of times per second it's dog slow. Why? Without an appreciation for what all those methods are doing under the hood, they cannot know.

things.each { |thing|            # O(things)
    idx = thing.index(marker)      # O(thing)
    thing[idx] = stuff             # O(1)
}

Those deceivingly simple looking Array methods are their own hidden loops. In the worst case each one must scan the whole list. Loops in loops makes this exponentially slow, it's O(n*m). How slow? If things is 1000 items long, and each thing has 1000 items in it that's... 1000 * 1000 or 1,000,000 operations!

And this isn't nearly the amount of trouble students can get into, normally they write O(n!) loops. I actually find it hard to come up with an example I'm so ingrained against it.

But that only becomes apparent after you throw a ton of data at it. While you're writing it, how can you know?

How can they make it faster? Without understanding the other options available to you and their performance characteristics, like hashes and sets and trees, they cannot know. And experienced programmer would make one immediate change to the data structure and change things to a list of sets.

things.each { |thing|    # O(things)
    thing.delete(marker)   # O(1)
    thing.add(stuff)       # O(1)
}

This is much faster. Deleting and adding with an unordered set is O(1) so it's effectively free no matter how large thing gets. Now if things is 1000 items long, and each thing has 1000 items in it that's 1000 operations. By using a more appropriate data structure I just sped up that loop by 1000 times. Really what I did is changed it from O(n*m) to O(n).

Another solid example is learning how to write a solid comparison function for multi-level data. Why is the Schwartzian transform fast? You can't appreciate that without understanding how sorting works.

You could simply be told these things, sorting is O(n log n), finding something in a list is O(n), and so on... but having to do it yourself gives you a visceral appreciation for what's going on under the hood. It makes you appreciate all the work a modern language does for you.

That said, there's little point in writing six different sort algorithms, or four different trees, or five different hash conflict resolution functions. Write one of each to appreciate them, then just learn about the rest so you know they exist and when to use them. 98% of the time the exact algorithm doesn't matter, but sometimes it's good to know that a merge sort might work better than a quick sort.

Because honestly, you're never going to write your own sort function. Or tree. Or hash. Or queue. And if you do, you probably shouldn't be. Unless you intend to be the 1% that writes the underlying libraries (like I do), if you're just going to write web apps and business logic, you don't need a full blown Computer Science education. Spend that time learning Software Engineering instead: testing, requirements, estimation, readability, communications, etc...

So when a student asks "why are we learning this stuff when it's all built into the language now?" (echos of "why do I have to learn math when I have a calculator?") have them write their naive loop with their fancy methods. Shove big data at it and watch it slow to a crawl. Then write an efficient loop with good selection of data structures and algorithms and show how it screams through the data. That's their answer.


NOTE: This is the original answer before the question was understood.

Most modern languages use quicksort as their default sort, but usually modified to avoid the O(n^2) worst case. Here's the BSD man page on their implementation of qsort_r(). Ruby uses qsort_r.

The qsort() and qsort_r() functions are an implementation of C.A.R. Hoare's ``quicksort'' algorithm, a variant of partition-exchange sorting; in particular, see D.E. Knuth's Algorithm Q. Quicksort takes O N lg N average time. This implementation uses median selection to avoid its O N**2 worst-case behavior.

PHP also uses quicksort, though I don't know which particular implementation.

Perl uses its own implementation of quicksort by default. But you can also request a merge sort via the sort pragma.

In Perl versions 5.6 and earlier the quicksort algorithm was used to implement "sort()", but in Perl 5.8 a mergesort algorithm was also made available, mainly to guarantee worst case O(N log N) behaviour: the worst case of quicksort is O(N**2). In Perl 5.8 and later, quicksort defends against quadratic behaviour by shuffling large arrays before sorting.

Python since 2.3 uses Timsort and is guaranteed to be stable. Any software written in Python (Django) is likely to also use the default Timsort.

Javascript, really the ECMAScript specification, does not say what type of sorting algorithm to use for Array.prototype.sort. It only says that it's not guaranteed to be stable. This means the particular sorting algorithm is left to the Javascript implementation. Like Python, any Javascript frameworks such as React or Leaflet are likely to use the built in sort.

Visual Basic for Applications (VBA) comes with NO sorting algorithm. You have to write your own. This is a bizarre oversight for any language, but particularly one that's designed for business use and spreadsheets.

Schwern
  • 153,029
  • 25
  • 195
  • 336
0

As pointed out by Schwern, the sorting is almost always provided by the programming language or its implementation engine, and libraries / frameworks just use that algorithm, with a custom comparison function when they need to sort complex objects.

Now if your objective is to have a real life example in the Web context, you could actually use on the contrary the "lack of" sorting method in SVG, and make an exercise out of it. Unlike other DOM elements, an SVG container paints its children in the order they are appended, irrespective of any "z-index" equivalent. So to implement a "z-index" functionality, you have to re-order the nodes yourself.

And to avoid just using a custom comparison function and relying on array.sort, you could add extra constraints, like stability, typically to preserve the current order of nodes with the same "z-index".

Since you mention Leaflet, one of the frustration with the pre 1.0 version (e.g. 0.7.7), was that all vector shapes are appended into the same single SVG container, without any provided sorting functionality, except for bringToFront / bringToBack.

ghybs
  • 47,565
  • 6
  • 74
  • 99