6

I find myself with this problem quite often : given a sequence, find the k-smallest element.The question is not that hard , but what i am looking for is an "idiomatic" way of doing it that is both safe (few place for error) and that communicate intent well. So end-up doing is sorting the sequence , then taking the first k element :

std::sort(container.begin(),container.end());
std::vector<T> k_smallest(container.begin(),container.begin() + k);

This seems to me both safe and easy to understand, but the complexity in here is nlogn + k, instead of just n. How do you guys do this, is there an idomatic way (using some obscure function from maybe) that would give the optimal complexity without having to re-implement the wheel

amit
  • 175,853
  • 27
  • 231
  • 333
GreyGeek
  • 918
  • 5
  • 14

3 Answers3

16

std::nth_element() - linear complexity on average.

nth_element is a partial sorting algorithm that rearranges elements in [first, last) such that:

  • The element pointed at by nth is changed to whatever element would occur in that position if [first, last) was sorted.
  • All of the elements before this new nth element are less than or equal to the elements after the new nth element.
Martin Ba
  • 37,187
  • 33
  • 183
  • 337
Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
  • 2
    +1 It's amazing how few STL algorithms we think about using in general! – Matthieu M. Mar 14 '12 at 15:15
  • i get from the documentation that this allow me to get the k -smallest element... but how do i used this to the all the k smallest elements without getting to nlogn + k in complexity? – GreyGeek Mar 14 '12 at 15:18
  • 3
    @GreyGeek: After using `std::nth_element` all the elements _less_ than the nth are in the range [begin, nth], they just aren't ordered. If you need them ordered you can sort that sub-range. – Blastfurnace Mar 14 '12 at 15:22
  • 3
    If you need them sorted however, you should probably use `partial_sort` to start with. – Mark B Mar 14 '12 at 15:52
  • 1
    @MarkB: Good point. I just did some brief testing and, for small values of `k`, using `std::partial_sort` can be significantly faster than `std::nth_element + std::sort`. For large `k` the two-step solution wins. Looks like something that needs to be profiled for GreyGeek's specific dataset. – Blastfurnace Mar 14 '12 at 16:10
  • someone knows a good reference that explains in details the complexity of std::nth_element ? – GreyGeek Mar 14 '12 at 16:12
  • @GreyGeek: The C++ Standard just says "Linear on average". – Blastfurnace Mar 14 '12 at 16:18
7

You might want to have a look at partial_sort()

It is both simple to understand and requires no extra work, and expected to be better [or at least not worse then] sort() if you only care for the kth element.

For optimal performance - you might want to use selection algorithm, but it requires more work.

amit
  • 175,853
  • 27
  • 231
  • 333
  • Note that as the name implies, `std::partial_sort()` has to order elements preceding k'th element. Which is not necessary if only k'th element is of interest. – Maxim Egorushkin Mar 14 '12 at 15:15
1

First Algorithm:

Steps:

  • Use Selection algorithm to find the k_th element.
  • Choose it as a pivot and perform partition (from quick sort algorithm ) resulting in having the smallest k elements left to the pivot.

Complexity: O(n) worst case

Second Algorithm:

Steps:

  • Use make_heap to create heap out of the array elements.
  • Perform k times the following:
    • read first element.
    • pop it using pop_heap

you could use priorty queue's methods (c'tor, top, pop) to achieve the same woithout knowing the algorithm lying behind.

Complexity: O(n+ k * log(n)) worst case

Hanna Khalil
  • 975
  • 1
  • 10
  • 28