How to efficiently permute an array in-place (using std::swap)

Question

How can I apply a permutation in-place? My permutations are effectively size_t[] where perm[i] represents the target index for an input index i.

I know how to apply a permutation if I have an input and output array:

struct Permutation {
    std::vector<size_t> perm;

    template <typename T>
    void apply(const T in[], T out[]) const
    {
        for (size_t i = 0; i < size(); ++i) {
            out[i] = std::move(in[perm[i]]);
        }
    }

}

However, I would like to do this with only one array, similar to how std::sort works, so just using std::swap. My idea so far is:

struct Permutation {
    std::vector<size_t> perm;

    template <typename T>
    void apply(T data[]) const
    {
        for (size_t i = 0; i < size(); ++i) {
            std::swap(data[i], data[perm[i]]);
        }
    }

}

But this wouldn't work. For example:

Permutation perm = {2, 1, 0};
char data[] {'a', 'b', 'c'};
perm.apply(data);

// because I swap indices 0 and 2 twice, I end up with the input array
data == {'a', 'b', 'c'};

So how do I correctly permute an array in-place? It is okay if additional memory is allocated, as long as this happens in a pre-computation step when the Permutation is constructed. I want the in-place permutation to happen fast and from the looks of it, demanding that no additional memory is allocated at all will lead to some severe performance sacrifices.

I am specifically referencing Algorithm to apply permutation in constant memory space, where all of the provided answers either cheat by using negative-integer space to avoid an allocation or enter "nasty" nested loops which blow up the time complexity to O(n²).

Edits

Please pay attention before suggesting std::next_permutation. I am not trying to generate all possible permutations, which I could do with std::next_permutation. I am instead trying to apply a single particular permutation to an array.

Does this answer your question? [C++ Code to Generate Permutations](https://stackoverflow.com/questions/28007090/c-code-to-generate-permutations) — Shridhar R Kulkarni, Sep 02 '20 at 10:32
@ShridharRKulkarni no, it obviously does not, see the previous comment. — Jan Schultke, Sep 02 '20 at 10:36
I don't think it is possible in O(n) time without additional memory. And in the worst case this additional memory is actually a new array, just like your output array. — Dialecticus, Sep 02 '20 at 10:52
You need to extract [cycles](https://en.wikipedia.org/wiki/Cyclic_permutation) from your permutation and apply them. — n. m. could be an AI, Sep 02 '20 at 10:54
@Dialecticus I wouldn't be so quick to make this statement. Permutations consist of multiple cycles and at least each individual cycle can be applied by continuously swapping. — Jan Schultke, Sep 02 '20 at 10:54
True, but you have to save the information about which items are already swapped. Additional memory. — Dialecticus, Sep 02 '20 at 10:55
I assume that this is correct: https://medium.com/@kevingxyz/permutation-in-place-8528581a5553 — , Sep 02 '20 at 10:56
@YvesDaoust yeah that looks like it might work, it's just unfortunate that it resorts to a nested loop. — Jan Schultke, Sep 02 '20 at 10:59
I guess that an efficient solution is presented in Knuth, working in linear time. It uses the fact that any permutation is made of a number of cycles, and cycles are easily permuted. — , Sep 02 '20 at 11:01
Just check every time if `perm[i] < i` to avoid duplicated swaps — pptaszni, Sep 02 '20 at 11:22
@pptaszni This doesn't work. The (now deleted) comment mentioned a sufficient counter-example: It wouldn't work e.g. for `{ 2, 0, 1 }`. — Scheff's Cat, Sep 02 '20 at 11:44
Does this answer your question? [Algorithm to apply permutation in constant memory space](https://stackoverflow.com/questions/16501424/algorithm-to-apply-permutation-in-constant-memory-space) — pptaszni, Sep 02 '20 at 11:47
@pptaszni not quite, I have edited the question to make it more distinct from what you have provided. — Jan Schultke, Sep 02 '20 at 12:34

Jan Schultke · Accepted Answer · 2020-09-02T12:19:15.170

The hint to find the cycles and permute each cycle worked for me. To sum up my approach, I find the start indices of all cycles in the constructor. Then, in apply(), I permute each cycle by just repeatedly using std::swap.

struct Permutation {
private:
    /// The single vector which stores both the permutation
    /// AND the indices of the cycles starts.
    std::vector<size_t> perm;
    /// The size of the permutation / index of first cycle index.
    size_t permSize;

public:
    Permutation(std::vector<size_t> table)
        : perm{std::move(table)}, permSize{perm.size()} {
        findCycles();
    }

    template <typename T>
    void apply(T data[]) const {
        for (size_t cycle = permSize; cycle < perm.size(); ++cycle) {
            const size_t start = perm[cycle];
            for (size_t prev = start, next = perm[prev];
                 next != start;
                 prev = next, next = perm[next]) {
                std::swap(data[prev], data[next]);
            }
        }
    }

    size_t size() const {
        return permSize;
    }

private:
    void findCycles();
}

findCycles() is also easy to implement, but requires the temporary allocation of a bit-vector.

void Permutation::findCycles() {
    std::vector<bool> visited(size());

    for (size_t i = 0; i < size(); ++i) {
        if (visited[i]) {
            continue;
        }
        for (size_t j = i; not visited[j]; ) {

            visited[j] = true;
            j = perm[j];
        }
        perm.push_back(i);
    }
}

In worst case there will be n/2 cycles, each of length 2. Vector of size n/2 is an additional memory. It is less than the original output vector of size n, but you pay the price of having to run twice through the input vector. Time is usually more of an issue than memory, but it's finally up to you to figure out what you have and what you need. — Dialecticus, Sep 02 '20 at 13:48
@Dialecticus thanks but the worst case is an identity permutation, where there are N cycles each of length 1. So the worst case is doubling the input vector in size. I have benchmarked both `apply` methods and the in-place method was almost exactly twice as slow. So there really isn't much of a point in using it, unless one is working with gigabytes of data where caching becomes problematic. — Jan Schultke, Sep 02 '20 at 13:51

How to efficiently permute an array in-place (using std::swap)

Edits

1 Answers1