move from right side to odd positions and from left side to even in-place

Question

Given a non-empty array of items. You have to move all the items from the right side to odd positions (zero-based), and from the left side — to even positions, as follows:

raw data: 0 2 4 6 8 10 12 14 1 3 5 7 9 11 13

result: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

What in-place algorithm exists for this with O(n) time complexity? What is its implementation?

The inverse problem is solved here (this algorithm can be essentially inverted, but it will look ugly).

I guess technically it's an "algorithm", but it seems simply splitting the list in half in O(n), then interleaving the two halves (also O(n)), would suffice... Although, I guess that assumes a contiguous array data structure - a singly linked list probably can't be split in O(n), although the interleaving would still be linear... — twalberg, Nov 09 '12 at 15:54
@twalberg, sounds like you've got the beginnings of an answer there. Can you talk more about the interleaving process? Keeping in mind that it must be in-place. — Kevin, Nov 09 '12 at 16:07
Oops... missed the in-place requirement. What I was imagining would require the use of auxiliary storage, although the result could certainly be copied back to the input array... @Kevin - didn't elaborate too much because it smelled a bit like a homework/exam type question that it would best benefit the OP to fully develop the solution to, with the help of some initial pointers... — twalberg, Nov 09 '12 at 16:20
I think you're wrong. On this subject, written serious scientific works. — Tomilov Anatoliy, Nov 09 '12 at 17:28

score 1 · Accepted Answer · edited May 23 '17 at 12:34

Here is only the algorithm itself. For details, explanations, and alternative approaches see answer for the inverse problem.

Initialize pointer to the pool of right-side elements to N/2.
Get largest sub-array having size 3^k+1
Join (3^k+1)/2 elements from the array's beginning and (3^k+1)/2 elements from the pool of right-side elements by exchanging appropriate sub-arrays. Update pool pointer.
Apply cycle leader algorithm to the parts of this sub-array, starting from positions 1, 3, 9, ... 3^k-1: move element to its proper position in sub-array (elements from the left of sub-array to even positions, from the right - to odd positions), the replaced element should be also moved to its proper position, etc. until this procedure comes back to the starting position.
Process the remaining parts of the array recursively, using steps 2 .. 4.

This problem is simpler than the inverse problem, referred in OP, because here we have to reorder sub-arrays starting from the larger ones, in the same order as doing cycle leader algorithm (the inverse problem has to do it separately and in reverse order, starting from the smaller ones to obtain O(N) complexity).

WRT the second clause: sub-array of what? Sub-array of the lower half? What to do if we have odd length of source array? — Tomilov Anatoliy, May 29 '14 at 07:18
@Orient: I mean the left-most sub-array of the whole array (more precisely, of its not-yet-processed part). And as far as I understand, you've implemented it exactly like this. With odd length of source array all depends on how you define "right side". If "right side" starts from the middle element, this problem has no solution. If "right side" starts from the element next to middle one, we could just add (implicitly) one more element to the array, then move elements as usual, then remove this additional element (which is still at the end of the array). — Evgeny Kluev, May 29 '14 at 10:01

Tomilov Anatoliy · Answer 2 · 2014-06-04T07:34:21.020

I finally found the solution of direct and inverse problems:

#include <iterator>
#include <algorithm>
#include <type_traits>
#include <limits>
#include <deque>
#include <utility>

#include <cassert>

template< typename Iterator >
struct perfect_shuffle_permutation
{

    static_assert(std::is_same< typename std::iterator_traits< Iterator >::iterator_category, std::random_access_iterator_tag >::value,
                  "!");

    using difference_type = typename std::iterator_traits< Iterator >::difference_type;
    using value_type = typename std::iterator_traits< Iterator >::value_type;

    perfect_shuffle_permutation()
    {
        for (difference_type power3_ = 1; power3_ < std::numeric_limits< difference_type >::max() / 3; power3_ *= 3) {
            powers3_.emplace_back(power3_ + 1);
        }
        powers3_.emplace_back(std::numeric_limits< difference_type >::max());
    }

    void
    forward(Iterator _begin, Iterator _end) const
    {
        return forward(_begin, std::distance(_begin, _end));
    }

    void
    backward(Iterator _begin, Iterator _end) const
    {
        return backward(_begin, std::distance(_begin, _end));
    }

    void
    forward(Iterator _begin, difference_type const _size) const
    {
        assert(0 < _size);
        assert(_size % 2 == 0);
        difference_type const left_size_ = *(std::upper_bound(powers3_.cbegin(), powers3_.cend(), _size) - 1);
        cycle_leader_forward(_begin, left_size_);
        difference_type const rest_ = _size - left_size_;
        if (rest_ != 0) {
            Iterator middle_ = _begin + left_size_;
            forward(middle_, rest_);
            std::rotate(_begin + left_size_ / 2, middle_, middle_ + rest_ / 2);
        }
    }

    void
    backward(Iterator _begin, difference_type const _size) const
    {
        assert(0 < _size);
        assert(_size % 2 == 0);
        difference_type const left_size_ = *(std::upper_bound(powers3_.cbegin(), powers3_.cend(), _size) - 1);
        std::rotate(_begin + left_size_ / 2, _begin + _size / 2, _begin + (_size + left_size_) / 2);
        cycle_leader_backward(_begin, left_size_);
        difference_type const rest_ = _size - left_size_;
        if (rest_ != 0) {
            Iterator middle_ = _begin + left_size_;
            backward(middle_, rest_);
        }
    }

private :

    void
    cycle_leader_forward(Iterator _begin, difference_type const _size) const
    {
        for (difference_type leader_ = 1; leader_ != _size - 1; leader_ *= 3) {
            permutation_forward permutation_(leader_, _size);
            Iterator current_ = _begin + leader_;
            value_type first_ = std::move(*current_);
            while (++permutation_) {
                assert(permutation_ < _size);
                Iterator next_ = _begin + permutation_;
                *current_ = std::move(*next_);
                current_ = next_;
            }
            *current_ = std::move(first_);
        }
    }

    void
    cycle_leader_backward(Iterator _begin, difference_type const _size) const
    {
        for (difference_type leader_ = 1; leader_ != _size - 1; leader_ *= 3) {
            permutation_backward permutation_(leader_, _size);
            Iterator current_ = _begin + leader_;
            value_type first_ = std::move(*current_);
            while (++permutation_) {
                assert(permutation_ < _size);
                Iterator next_ = _begin + permutation_;
                *current_ = std::move(*next_);
                current_ = next_;
            }
            *current_ = std::move(first_);
        }
    }

    struct permutation_forward
    {

        permutation_forward(difference_type const _leader, difference_type const _size)
            : leader_(_leader)
            , current_(_leader)
            , half_size_(_size / 2)
        { ; }

        bool
        operator ++ ()
        {
            if (current_ < half_size_) {
                current_ += current_;
            } else {
                current_ = 1 + (current_ - half_size_) * 2;
            }
            return (current_ != leader_);
        }

        operator difference_type () const
        {
            return current_;
        }

    private :

        difference_type const leader_;
        difference_type current_;
        difference_type const half_size_;

    };

    struct permutation_backward
    {

        permutation_backward(difference_type const _leader, difference_type const _size)
            : leader_(_leader)
            , current_(_leader)
            , half_size_(_size / 2)
        { ; }

        bool
        operator ++ ()
        {
            if ((current_ % 2) == 0) {
                current_ /= 2;
            } else {
                current_ = (current_ - 1) / 2 + half_size_;
            }
            return (current_ != leader_);
        }

        operator difference_type () const
        {
            return current_;
        }

    private :

        difference_type const leader_;
        difference_type current_;
        difference_type const half_size_;

    };

    std::deque< difference_type > powers3_;

};

Yes, it seems so, but the use of `swap` - is the most common approach. — Tomilov Anatoliy, Nov 10 '12 at 12:44
In cycle-leader algorithm, we move forward the previous value, replacing the current one. Thus, we cannot avoid the three assignments per cycle. — Tomilov Anatoliy, Nov 10 '12 at 13:16
The problem is not in the `swap` itself, but in swapping two memory locations. We cannot avoid three assignments per cycle, but we can avoid two memory writes per cycle if we keep leader value in local variable and write it to memory only after loop is completed. Probably compiler is smart enough to do this optimization for us, but I'm not sure. — Evgeny Kluev, Nov 10 '12 at 13:37
I have modified the algorithm in proper way. In this variant permuatations done in reverse order, and the leader is stored in a local variable and is used only at the last step. — Tomilov Anatoliy, Nov 12 '12 at 08:26
@EvgenyKluev I rewrote the solution. Now its implementation is more clear. [Repository](https://bitbucket.org/tomilov/perfect-shuffle-permutation). — Tomilov Anatoliy, Jun 04 '14 at 07:35
With recursion this solution is much clearer indeed. But (strictly speaking) it is not in-place anymore - because of the recursion stack and because of powers-of-3 array. I don't see real necessity to make sub-array size computation O(log log N), it does not improve asymptotic complexity, and it improves speed only very slightly. But if you really want to do this, it is possible to pre-compute powers-of-3 array at compile time... — Evgeny Kluev, Jun 04 '14 at 09:51
Or (to get rid of the power-of-3 array and make computation almost branchless) you could use either bitwise tricks or compiler intrinsic to get approximate base-2 logarithm, rescale it to base-3 logarithm, then use binary exponentiation to choose optimal sub-array size... — Evgeny Kluev, Jun 04 '14 at 09:52
By the way, it is not easy to make this O(N) algorithm really fast. For small `N` it is just too complicated, for large `N` it might be very cache-unfriendly (unless you insert `prefetch` instruction). Simpler O(N log N) algorithm may be faster (at least in some cases). Also if you want to generalise this algorithm, you could read this answer to other question: ["Interleave three equally sized partitions in an array inplace in O(n) time"](http://stackoverflow.com/a/23529396/1009831). — Evgeny Kluev, Jun 04 '14 at 09:53
Since you compute `left_size_` with binary search on every recursion step, total time for subarray size computation is O(log N * log log N). It may be decreased to O(log N) if you search power-of-3 array linearly, starting from the position found on previous recursion step. — Evgeny Kluev, Jun 04 '14 at 10:26
@EvgenyKluev thank you for the note. If you find something else, please keep me informed. — Tomilov Anatoliy, Jun 04 '14 at 10:30
@EvgenyKluev WRT computing powers-of-three at compile time I found, that expressive power of the **C++** templates not allows to do it for arbitrary integer types. — Tomilov Anatoliy, Jun 04 '14 at 10:33
@EvgenyKluev I don't want to use non-exact arithmetic for the powers of three computing. — Tomilov Anatoliy, Jun 04 '14 at 10:36
Generalisation of the problem is not my choise, because only the Haar wavelet computing is interesting to me as an application. — Tomilov Anatoliy, Jun 04 '14 at 10:47
@EvgenyKluev If I rewrite the algorithm in such manner, that all 40 (for 64-bit) possible recursion steps will be lenear after unrolling, then whether or not I achieve `O(1)`-memory requirement? — Tomilov Anatoliy, Jun 04 '14 at 10:56
Yes, certainly you would achieve O(1) space after transforming recursion to iteration. In fact, your previous implementation was truly in-place. Still recursive implementation is much more clear and O(log N) additional stack space is so insignificant in practice, so I don't see any reason to abandon it unless you have very tight memory constraints. — Evgeny Kluev, Jun 04 '14 at 11:11
@EvgenyKluev I should note that maximum number of recursion steps is only logarithmically depends on maximum length of sequence. If we works with fixed finite amount of memory, then we have const additional amount of memory. — Tomilov Anatoliy, Jun 04 '14 at 15:36

move from right side to odd positions and from left side to even in-place

2 Answers2