4

I have been trying various sorting algorithms for past few days. Starting from 1) Algorithms with O(n^2) time complexity sorting 2) O(n log n) time complexity with in place and out of place sorting techniques

I am wondering if there is any sorting algorithm which sorts in linear time or less. I heard about radix sort which in best cases is near to linear time sorting with some space complexity. Can someone enlighten me?

dharam
  • 7,882
  • 15
  • 65
  • 93
  • What type of data are you sorting, and how much do you have? "Fastest" != algorithmic complexity... – Reed Copsey May 15 '12 at 16:09
  • Radix sort is very fast. Some good info and examples here: http://en.wikipedia.org/wiki/Radix_sort – Jonathan M May 15 '12 at 16:10
  • Sorry for the misleading heading... What I meant by fastest is having an Order n complexity. I am trying to sort a list of Strings. The list contains around 10^4 elements. – dharam May 15 '12 at 16:11
  • 2
    Have you already looked at http://en.wikipedia.org/wiki/Sorting_algorithm ? It has a pretty in-depth list with both memory usage as well as complexities – Jon May 15 '12 at 16:13

6 Answers6

4

You can't ever sort in less than O(N), because you have to look at all N elements to determine that the list is sorted - so that's O(N) right there. You also can't sort faster than O(NlogN) if you sort by comparing against other elements in your list - but if you know something about your data, you can. If for example, you know that your data is English strings, you might be able to put them into buckets before sorting. e.g. put all strings starting with A into one bucket, with B into another, and so forth. This will be fast. You might need to make each bucket fairly large though - perhaps large enough to fit 1000 strings, since not all buckets will contain the same number of strings.

Then sort the individual buckets, which will be fast.

For a uniform distribution of data (i.e. 400 strings starting with each letter, which of course you won't have), I would guestimate that this would be O(N) + O(Nlog N/M), where M is the number of buckets.

You can obviously have nested buckets for the second letter, but the more buckets you have, the bigger your space requirements, since having to expand buckets on the fly will cost you execution time, so you want to make them sufficiently large to start with. This implies that many of them will be quite a bit bigger than they need to be, as you don't know everything about the distribution of your data.

Library sort might be worth looking at as well.

Airsource Ltd
  • 32,379
  • 13
  • 71
  • 75
  • 1
    Actually if you use counting sort (the generalization of bucket sort), you get a runtime behavior of O(n + M) (M being the number of distinctive values), which really is O(n). Radix sort has O(nk) with k being the number of digits. Pure bucket sort degenerates to O(n^2) though.# – Voo May 15 '12 at 18:47
3

The fastest general sort is merge sort, which can take advantage of the map / reduce pattern (which quick sort cannot)

But if you know something about your data, datasets can in some cases be sorted even faster than that.

You cannot sort faster than O(n) that makes no sense: You must at least deal with each element once

In response to your mentioning radix sort:

(from wikipedia)

Radix sort's efficiency is O(k·n) for n keys which have k or fewer digits. Sometimes k is presented as a constant, which would make radix sort better (for sufficiently large n) than the best comparison-based sorting algorithms, which are all O(n·log(n)). However, in general k cannot be considered a constant. In particular, under the common (but sometimes implicit) assumption that all keys are distinct, then k must be at least of the order of log(n), resulting in no better results than other sorts.

ControlAltDel
  • 33,923
  • 10
  • 53
  • 80
2

Some sorting algorithms that run in linear time are counting sort, radix sort and bucket sort. The gotcha with these algorithms is that they require assumptions about the input. Counting sort and Radix sort assume the input consists of integers in a small range. Bucket sort assumes that the input is generated by a random process that distributes elements uniformly over an interval. Page3-6, gives a nice outline of the above algorithms.

iCanHasFay
  • 662
  • 5
  • 12
0

In case you want to know about the fastest sorting technique for integer values then I would suggest you to refer the following link: https://github.com/fenilgmehta/Fastest-Integer-Sort

It uses radix sort and counting sort for large arrays and merge sort along with insertion sort for small arrays. According to statistics, this sorting algorithm is way faster than C++ std::sort for integral values.

It is 6 times faster than C++ STL std::sort for "int64_t array[10000000]".

Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
-1

Check-out NoChop when used as a sort-engine (www.agdresearch.com). It decomposes keys into significant bytes and holds these in a sparse-matrix. This allows for 256 children per node instead of the normal 2 (e.g. BTree), and so is theoretically up to 8 times faster (2 ^ 8 = 256). In practice it is already 50% faster that QuickSort, 350% faster than BinarySearch, and 100% faster than BTree (balance tree version).

Best regards - Andrew

  • Are you [affiliated](https://stackoverflow.com/help/promotion) with *AGD Research* in any way? If so, disclose it up-front in the post body. – greybeard Aug 23 '22 at 11:09
  • [Bayer-McCreight trees](https://en.m.wikipedia.org/wiki/BTree) are notorious for multi-way branching. There even is an ordering error in *AGD Research*'s "presentation" of a "BTree" (and an irritating mix of hyperlinks to B-tree information and text about "binary tree"s (the search varieties are often subsumed under *BST*)). "NoChop STree" looks a re-invention of the [radix tree](https://en.m.wikipedia.org/wiki/Radix_tree). – greybeard Aug 23 '22 at 11:27
-2

(Edit to my previous bad post, sorry everybody)

One way to improve performance of sorting algorithms is parallel processing:

Parallel Sort Algorithm

In this post, performance of sequential and parallel QuickSort algorithm is compared using a list of integers. Performance is enhanced significantly in a dual core machine. QuickSort can even perform at O(log n) on a system with n processors, according to this article:

http://en.wikipedia.org/wiki/Merge_sort#Parallel_processing

It might sound unreal to have that many cores available, but with infrastructure as a service (Amazon Cloud, Azure...) it can be an available option for mission critical implementations.

Ali Ashraf
  • 119
  • 1
  • 8
j0aqu1n
  • 1,013
  • 7
  • 14