2

I found inventor of flashsort claiming that it is faster than quicksort for uniformly distributed data and most of real world data is close to uniform. Most of time while working on projects I found data are close to uniform. For example, I never encounter data like [1,3,500,10000,100000000,99999090990] on a daily basis. My data may look like [300,200,500,906,1500] and as n grows we will get more uniform data. After reading article I found that if any class contain too much data then we can recursively call flashsort to divide array further.

Am I missing something ? Why we are not using flashsort for integer data first and if it goes in worst case then we can use sort like introsort to avoid O(n^2) ?

Ilya
  • 5,377
  • 2
  • 18
  • 33
user3811315
  • 151
  • 7
  • 3
    FYI more and more languages are using [Timsort](https://en.wikipedia.org/wiki/Timsort) by default these days, because it handles real-world datasets much better than something like a naive quicksort implementation. I suspect things like flashsort and radix sort aren't used by default very often because they're only efficient for specific data sets, and standard libraries generally include a small number of more general-purpose algorithms. More specialized software (like database engines) probably do use more specialized algorithms like this where it makes sense. – Jeremy Jan 06 '16 at 19:18
  • @JeremyBanks there must be some other reason for not including flash sort in library sort algorithm.java library use insertion sort when n is small and merger sort for medium and quicksort when n is large.So why to use sorting algorithm of o(nlogn) when flashsort can give same result in o(n) ? – user3811315 Jan 06 '16 at 19:37
  • 1
    If the goal is to sort integer data, than radix sort is fastest, although it needs a second array and a small matrix of counts that get converted into indices, so space overhead is O(n), but on a typical PC, this isn't an issue anymore. – rcgldr Jan 06 '16 at 19:38
  • 2
    In my (admittedly subjective) experience, ordering by an integer key is actually quite rare. When integer entities are present they are often already the key (the data is pre-sorted) or represent a flag or a status of some kind that you typically don't sort on. It is a much more common request to have data sorted by a character, float, or temporal field. In my view it is actually quite silly that sorting algorithms are often bench-marked using a randomly ordered array of unique integers because that type of input hardly ever occurs in the wild. – 500 - Internal Server Error Jan 06 '16 at 19:40
  • @rcgldr No, when data is unique radix sort will take o(nlogn) http://stackoverflow.com/questions/3539265/why-quicksort-is-more-popular-than-radix-sort look at last answer in link by abhinav Chauhan – user3811315 Jan 06 '16 at 19:42
  • @user3811315 - radix sort of integers (not keys or strings or ...) using a second array and matrix for counts / offsets, sometimes called radix / counting sort, is O(n). If sorting by 8 bits at a time, then it takes 4 passes to sort 32 bit integers, and 8 passes to sort 64 bit integers, plus the initial pass to generate the counts (histogram) that get converted into indices. – rcgldr Jan 06 '16 at 19:46
  • @500-InternalServerError yes,you are right but we can't just ignore the other data types. – user3811315 Jan 06 '16 at 19:49
  • @user3811315: No argument there. – 500 - Internal Server Error Jan 06 '16 at 19:56
  • @rcgldr: Radix sort's worst case performance is really bad when you get into 32 or 64 bit integers. And on another note, it can be done in-place. Edit: [I wrote an implementation for one](http://stackoverflow.com/a/29702954/27678) a little while ago here on StackOverflow. – AndyG Jan 06 '16 at 19:59
  • @AndyG - on my system (Intel 2600K), radix sort on 4 million pseudo random 64 bit unsigned integers takes about 0.220 seconds, while quick sort take more than 0.300 seconds. I'm not sure what you mean by worst case. Radix sort by byte makes one read pass to generate a matrix of counts that get converted into indices, then 8 passes to sort the data. The writes won't be cache friendly, but the reads will be. – rcgldr Jan 06 '16 at 20:05
  • Note that flash sort is similar to in place MSD first radix sort, but instead of generating counts to determine exact bin size, it scans the array to determine min and max values, then it approximates the bin boundaries based on relative value of each element, and then has to sort the bins using a different algorithm, while radix sort just continues with the next most significant digit till done. The example code is for integers, but flash sort could handle other key types if they can be converted into relative integer values. – rcgldr Jan 06 '16 at 20:06
  • @rcgldr so why flashsort or radix sort is not used in practice in any library sort function ? – user3811315 Jan 06 '16 at 20:11
  • @rcgldr may be reason for not using radix sort is space but flash sort doesn't have such problem – user3811315 Jan 06 '16 at 20:15
  • 1
    @user3811315 - Space not the issue since std::stable_sort() allocates temp array 1/2 the size of data and std::stable_partition() allocates temp array same size as data. Jeremy Banks answered this earlier in his comment: the libraries use a small number of general purpose algorithms. – rcgldr Jan 06 '16 at 23:22

0 Answers0