19

I'm using Collections.sort() to sort a LinkedList whose elements implements Comparable interface, so they are sorted in a natural order. In the javadoc documentation its said this method uses mergesort algorithm which has n*log(n) performance.

My question is if there is a more efficient algorithm to sort my LinkedList?

The size of that list could be very high and sort will be also very frequent.

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
rasgo
  • 1,381
  • 4
  • 15
  • 28
  • Do you mean memory or cpu efficiency? – Thomas Jung May 21 '10 at 16:38
  • 1
    what does "very high" mean? 1000, 1000000, 100000000? – Bozho May 21 '10 at 16:38
  • Because as many people have noted, O(n log n) is the provable lower-bound on comparison based sorting, you can't really do much better. I think the only way people could help you more is if you provided more information about what you're sorting or why you need to sort it so we could recommend better solutions based on your specific case. =] – iffy May 21 '10 at 17:27
  • @iffy, you're forgetting the bogosort: that's O(1), isn't it? – CPerkins May 21 '10 at 19:07
  • I believe the algorithm has been replaced with a dual-pivot quicksort in JDK 7. – Kevin Bourrillion May 22 '10 at 00:24
  • 1
    Even if nothing else, you at least want to change LinkedList to ArrayList so it can be sorted in place. – Kevin Bourrillion May 22 '10 at 00:25

5 Answers5

19

O(N log N) is very good asymptotically. That said, there are linear time O(N) non-comparison based sort, e.g. counting sort and bucket sort. This is useful when, e.g. you're sorting millions and millions of integers, but they're between 1..10.

Also, if the list is "almost sorted", the otherwise quadratic insertion sort is reported to actually be better under some scenarios.

Whether or not this is applicable, or even worth to implement, depends on your profiling results. I'd say that unless it shows the sort to be a bottleneck, don't worry about it.

See also

Related questions

Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
13

If you say the list will be sorted "very frequent", you should consider holding the list in a sorted stated all the time, like using a tree instead of a LinkedList. Maybe you can even use some SortedSet instead of a List, if you don't have any duplicated values and don't need any List operations (as you are sorting them anyway all the time). Check the TreeSet class of the SortedSet implementation.

This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).

If you want to iterate over this "list" (which is actually a Set) you can use the Iterator of the class.

Returns an iterator over the elements in this set in ascending order.

If you have duplicate values inside the List you have to use some tricks (like putting the value in a new class which also got some delta for sorting equal object)

Progman
  • 16,827
  • 6
  • 33
  • 48
2

There is no general sort algorithm better than n*log(n). And this is quite fast. By general I mean your data doesn't have special properties.

Petar Minchev
  • 46,889
  • 11
  • 103
  • 119
  • It will dump the LinkedList into an array which may take some time. This is not as efficiently implemented as in ArrayList. I suppose this is the question. – Thomas Jung May 21 '10 at 16:42
1

I am experimenting with large data sets (GBs of data) and have implemented a merge sort (there is a good example @ googlecode). However, I am using Collection.sort() to pre-sort my temporary buffers and in my experience Collection.sort() gets ridiculously slow at a certain threshold of data. With a auxiliary buffer of 96MB I can sort one of those buffers in about 30sec (note: this heavily depends on the comparators you use - I use a custom column layout with a quite complex column parser), however increasing this to a 128MB chunk size the time jumps to over 3 minutes. This is in no relation to the linear (or near linear) behavior I can observe for smaller chunks. This has so much impact, that a merge sort with smaller buffers in almost (?) all cases faster than a in memory sort using a 128MB buffer. To make this short: Merge sort is the way to go for large data sets beyond the 100MB boundary. I cannot really answer why that is, and those numbers might even be machine dependent (mine is a OS-X on a 2.6GHz i7 & 16GB memory).

jschober
  • 81
  • 1
  • 5
0

In terms of sorting the list, no, all comparison based sorts on general data are O(N log(N)).

If your resorting is due to insertions, then you can try to batch your insertions and then merge sort with the main list - if you have B new items, you sort them in O(B log(B)) then do a single level merge of the two lists which is O(N+B).

If your resorting is due to changes in the values of the items, you might be able to do a similar batching if you change the mutable values into immutable ones and treat the changes to be a batch of insertions and deletions. Otherwise, you won't be able to avoid sorting the whole list.

If your requirements allow it, then there are various non-linked-list structures such as TreeSet available which maintain a sorted order more efficiently, but will fail if the values are mutable.

Pete Kirkham
  • 48,893
  • 5
  • 92
  • 171