1

I have a list of elements that is almost in the correct order and the elements are just off by a relatively small amount of places compared to their correct position (e.g. no element that is supposed to be in the front of the list is in the end).

< TL;DR >

Practical origin: I have an incoming stream of UDP-Packages that contain signals all marked with a certain timestamp. Evaluating the data has shown, that the packages have not been send (or received) in the right order, so that the timestamp is not constantly increasing but jittering a bit. To export the data I need to sort it in advance.

< /TL;DR >

I want to use std::list.sort() to sort this list.

What is the sorting algorithm used by std::list.sort() and how is it affected by the fact that the list is almost sorted. I have a "feeling", that a divide-and-conquer based algorithm might profit from it.

Is there a more efficient algorithm for my quite specific problem?

Pete Becker
  • 74,985
  • 8
  • 76
  • 165
mxcd
  • 1,954
  • 2
  • 25
  • 38
  • There are [many sorting algorithms out there with different performance characteristics for different situations](https://github.com/Morwenn/cpp-sort/wiki/Benchmarks). You will have to benchmark various implementations and see which one is best. After you made sure it actually matters. – nwp Sep 04 '17 at 14:47
  • https://shreevatsa.wordpress.com/2008/05/16/premature-optimization-is-the-root-of-all-evil/ - do benchmarking and if it s bottleneck start worrying about algorithm – R2RT Sep 04 '17 at 14:47
  • Bubble sort may actually be useful here. From Wikipedia: "In computer graphics bubble sort is popular for its capability to detect a very small error (like swap of just two elements) in almost-sorted arrays and fix it with just linear complexity (_2n_)." – CompuChip Sep 04 '17 at 14:47
  • Why do you use a list? –  Sep 04 '17 at 17:29
  • @manni66 After seeing "Fire Lancer's" answer, I switched to a vector and do the inserting from the back for every item – mxcd Sep 04 '17 at 17:31
  • Dissection of the algorithm is somewhat pointless, as the guarantees mandated by the standard are more relevant. `std::list::sort` mandates complexity O(NlogN) or better, as well as preservation of all iterators and stability (a stable sort), and absolute avoidance of object-copies. *Profile*. If the performance is adequate, find something else to worry about. – WhozCraig Sep 07 '17 at 08:25

4 Answers4

1

It is not defined which algorithm uses, allthough it should be around N log N on average, such as quicksort.

If you are appending packets to the end of the "queue" as you consume them, so you want the queue always sorted, then you can expect any new packets correct position to nearly always be near the "end".

Therefore, rather than sorting the entire queue, just insert the packet into the correct position. Start at the back of the queue and compare the existing packets timestamp with those already there, insert it after the first packet with a smaller timestamp (likely to always be the end) or the front if there is no such packet in the event things are greatly out of order.

Alternatively, if you want to add all packets in order and then sort it, Bubble Sort should be fairly optimal because the list should still be nearly sorted already.

Fire Lancer
  • 29,364
  • 31
  • 116
  • 182
  • The OP is not asking about `std::sort` (which is probably quicksort), but the `std::list` member function called `sort` (which is almost certainly *not* quicksort). It is true that average number of comparisons is required to be `N log N` though. – Martin Bonner supports Monica Sep 04 '17 at 15:07
  • Yes, quicksort was a bad example pick on my part, but i didnt ever mention `std::sort`. (Infact, you cant use `std::sort` with a list because its not random access, the same reason *some* Quicksort implementations may not be suitable) – Fire Lancer Sep 04 '17 at 15:12
  • I jumped from quicksort to `std::sort`, because I assumed you had done the reverse. My bad. – Martin Bonner supports Monica Sep 04 '17 at 15:14
1

If every element is within k positions of its proper place, then insertion sort will take less than kN comparisons and swaps/moves. It's also very easy to implement.

Compare this to the N*log(N) operations required by merge sort or quick sort to see if that will work better for you.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
0

On mostly-sorted data Insertion Sort and Bubble Sort are among the most common ones that perform the best.

Live demo

Note also that having a list structure puts an extra constraint on indexed access, so algorithms that require indexed access will perform extra poorly. Therefore insertion sort is an even better fit since it needs only sequential access.

rustyx
  • 80,671
  • 25
  • 200
  • 267
0

In the case of Visual Studio prior to 2015, a bottom up merge sorting using 26 internal lists is used, following the algorithm shown in this wiki article:

https://en.wikipedia.org/wiki/Merge_sort#Bottom-up_implementation_using_lists

Visual Studio 2015 added support for not having a default allocator. The 26 internal lists initializers could have been expanded to 26 instances of initializers with user specified allocators, specifically: _Myt _Binlist[_MAXBINS+1] = { _Myt(get_allocator()), ... , _Myt(get_allocator())};, but instead someone at Microsoft switched to a top down merge sort based on iterators. This is slower, but has the advantage that it doesn't need special recovery if the compare throws an exception. The author of that change pointed out the fact that if performance is a goal, then it's faster to copy the list to an array or vector, sort the array or vector, then create a new list from the array or vector.

There's a prior thread about this.

`std::list<>::sort()` - why the sudden switch to top-down strategy?

In your case something like an insertion sort on a doubly linked list should be faster, if an out order node is found, remove the from the list, scan backwards or forwards to the proper spot and insert the node back into the list. If you want to used std::list, you can use iterators, erase, and emplace to "move" nodes, but that involves freeing and reallocating a node for each move. It would be faster to implement this with your own doubly linked list, in which case you can just manipulate the links, avoiding the freeing and reallocation of memory if using std::list.

rcgldr
  • 27,407
  • 3
  • 36
  • 61