5

I came up with this question after reading the excellent answer of std::next_permutation Implementation Explanation. Please refer to that post for an explanation of the algorithm used by STL, but I'll replicate the code here for your reference

#include <vector>
#include <iostream>
#include <algorithm>

using namespace std;

template<typename It>
bool next_permutation(It begin, It end)
{
    if (begin == end)
        return false;

    It i = begin;
    ++i;
    if (i == end)
        return false;

    i = end;
    --i;

    while (true)
    {
        It j = i;
        --i;

        if (*i < *j)
        {
            It k = end;

            while (!(*i < *--k))
                /* pass */;

            iter_swap(i, k);
            reverse(j, end);
            return true;
        }

        if (i == begin)
        {
            reverse(begin, end);
            return false;
        }
    }
}

int main()
{
    vector<int> v = { 1, 2, 3, 4 };

    do
    {
        for (int i = 0; i < 4; i++)
        {
            cout << v[i] << " ";
        }
        cout << endl;
    }
    while (::next_permutation(v.begin(), v.end()));
}

Let's take a look at this part

It k = end;

while (!(*i < *--k))
    /* pass */;

iter_swap(i, k);

From my understanding, this linear scan could be replaced by a binary search, because by construction the elements after i are already in descending (or in case of duplicate elements, non-ascending) order. Suppose we have a boolean array whose items are *idx > *i for each j <= idx < end, then all I need to do is to find the largest index whose item is True. Such an index must exist because we have *j > *i, which means the array starts with a True.

I don't know enough C++ to confidently provide a working example, but here is a complete implementation of next_permutation in Rust. If you don't know Rust, then the following pseudocode should give a good idea of what I mean by "binary search". (Well yes, it's Python, which is readable enough to be referred to as pseudocode :)

from typing import List

def bisearch(last: List[bool]) -> int:
    p, q = 0, len(lst) - 1
    while p + 1 < q:
        mid = (p + q) // 2
        if lst[mid]:
            p = mid
        else:
            q = mid

    return q if lst[q] else q - 1


if __name__ == '__main__':
    for pos_count in range(1, 5):
        for neg_count in range(5):
            lst = [True] * pos_count + [False] * neg_count
            assert bisearch(lst) == pos_count - 1

Question: why doesn't STL's implementation of next_permutation use the binary search? I understand that finding i requires O(n), and that both O(n) + O(n) and O(n) + O(ln(n)) are O(n), but practically speaking binary search should still at least marginally improve performance?

nalzok
  • 14,965
  • 21
  • 72
  • 139
  • 1
    Linear can be much faster over a contagious container ( eg `std::vector`, `std::string` ) because of the memory pre-fetcher. – Richard Critten Jun 24 '21 at 01:20
  • @RichardCritten I understand that sequential access is significantly faster than random access thanks to caching, but does that still apply when they need to look at `N` versus `ln(N)` elements respectively? – nalzok Jun 24 '21 at 01:24
  • 3
    It's not the cache but specifically the pre-fetcher - Here is an example - by Herb Sutter comparing O(n) vs O(log n) for the standard containers (guess which container wins) - starting at 45:48 - https://channel9.msdn.com/Events/Build/2014/2-661 Conclusion - you need to measure for your workload. – Richard Critten Jun 24 '21 at 01:25
  • Additionally (historically) Binary Search has some hard to find corner cases that have caught out quite a few implementations - see - https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues – Richard Critten Jun 24 '21 at 01:38
  • What make you think that a binary search could be used? Obviously one one permutation is totally in order. At least, if you believe that another algorithm could be used, then you should try it for the whole sequence of 16 permutations. – Phil1970 Jun 24 '21 at 01:48
  • @Phil1970 I'm not sure what you mean by *try it for the whole sequence of 16 permutations*. I'm certain the Rust playground example works for any input. Feel free to submit my solution to [LeetCode](https://leetcode.com/problems/next-permutation/) to verify. – nalzok Jun 24 '21 at 01:59
  • 2
    In the linked Rust implementation `let mid = (p + q) / 2;` is a candidate for overflow - I don't know what Rust does when overflow happens, but it's usually best to be avoided. See also the link above to Binary Search implementation issues. – Richard Critten Jun 24 '21 at 02:18
  • 1
    @RichardCritten Thanks for the tip! I'm replacing it with `let mid = p + (q - p) / 2;` – nalzok Jun 24 '21 at 02:21

1 Answers1

5

As @RichardCritten points out, just because we have better algorithmic complexity does not mean that the execution is faster. Additionally, the implementation is a bit more complex.

Below, we have made a very simple alteration to the original algorithm in the middle part.

Original Code:

if (*i < *j)
{
    It k = end;
    
    while (!(*i < *--k))
        /* pass */;
    
    iter_swap(i, k);
    reverse(j, end);
    return true;
}

Simple Binary Approach

if (*i < *j)
{
    auto test = std::lower_bound(std::make_reverse_iterator(end),
                                 std::make_reverse_iterator(i), *i);
    
    std::iter_swap(i, test);
    std::reverse(j, end);
    return true;
}

We make use of std::lower_bound as well as std::make_reverse_iterator since the given range is in descending order.

It should be noted that this implementation is not full proof and does not work when there are repeats. One of the main points is to demonstrate that even for simple cases, the original implementation is faster.

Here is a live ideone example demonstrating the speed of each approach. In our tests, we measure how long each approach takes to generate 10! = 3,628,800 permutations 100 times.

You will note that the linear implementation is almost twice as fast.

Joseph Wood
  • 7,077
  • 2
  • 30
  • 65