Combination of a Collection with Repetitions

Question

There are a lot of links on http://stackoverflow.com for how to do combinations: Generating combinations in c++ But these links presume to draw from an infinite set without repetition. When given a finite collection which does have repetition, these algorithms construct duplicates. For example you can see the accepted solution to the linked question failing on a test case I constructed here: http://ideone.com/M7CyFc

Given the input set: vector<int> row {40, 40, 40, 50, 50, 60, 100};

I expect to see:

40 40 40
40 40 50
40 40 60
40 40 100
40 50 50
40 50 60
40 50 100
40 60 100
50 50 60
50 50 100
50 60 100

Obviously I can use the old method store the output and check for duplicates as I generate, but this requires a lot of extra memory and processing power. Is there an alternative that C++ provides me?

Are you asking to select three elements from seven disregarding identical values ? You should get 7C3 = 35 sets in that case. You have listed only 11. An idea would be to do combinations of indexes 0 - 7 and return the elements indicated by the indexes. — Some Guy, Feb 04 '16 at 18:28
@NathanOliver Yeah, Jarod42 had the right of it I meant to type 30s in my `row` declaration. — Jonathan Mee, Feb 04 '16 at 18:30
@RandomGuy I'm asking for combinations without repetition. You'll notice that my http://ideone.com link in the question produces 35 results, but even the first 2 are duplicates, because it expects that I am providing an input set that is unique. I am not. I did do the combinations for my example by hand but I believe that I have everything. — Jonathan Mee, Feb 04 '16 at 18:33
Your output contains the values, yet the direction is reversed and duplicates are not filtered. To change the output of the direction check for `(v[n-i-1])` instead of `(v[i])`. — Simon Kraemer, Feb 04 '16 at 18:38
@SimonKraemer I really don't care about order what I care about is avoiding duplicates. I don't see how a change in order will help me there. — Jonathan Mee, Feb 04 '16 at 18:41
@Jarod42 Yeah that's what i had too, but my code was a bit more complex because I didn't expect a sorted input set. Anyway, that's what i'd like to get away from, because in a large input set this will waste large amounts of time generating duplicate results. — Jonathan Mee, Feb 04 '16 at 18:44
Please use the [edit link](http://stackoverflow.com/posts/35208664/edit) under the question to make 30 appear or disappear consistently. — n. m. could be an AI, Feb 04 '16 at 18:48
I think you might have an easier time finding relevant algorithms if you were more careful with your terminology. By definition a set never contains duplicates, so the input you're calling a set isn't really a set at all. — Jerry Coffin, Feb 04 '16 at 19:09
@JerryCoffin Thanks, I'm not down with the math terms, but you're correct. I've edited. — Jonathan Mee, Feb 04 '16 at 20:04
@Howard Hinnant wrote a whole mini-library on that topic, see [here](http://howardhinnant.github.io/combinations.html). I bet you find the solution therein. — davidhigh, Feb 13 '16 at 11:47
@davidhigh Interesting, I followed up on the Howard Hinnant lead, and he cites [N2639](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2639.pdf) as being slightly slower: http://stackoverflow.com/a/5286517/2642059 I believe this is due to the more standard-like `next_combination` paradigm which requires calls and returns per-combination as opposed to Howard Hinnant's `for_each_combination` paradigm. But as [N2639 looks to be accepted into the standard](http://bit.ly/20yPiNp), I've adopted that into my answer: http://stackoverflow.com/a/35215540/2642059 — Jonathan Mee, Feb 15 '16 at 13:35

Jonathan Mee · Answer 1 · 2018-10-22T12:22:07.953

Combinations by definition do not respect order. This frees us to arrange the numbers in any order we see fit. Most notably we can rely on to provide a combination rank. Certainly the most logical way to rank combinations is in sorted order, so we'll be depending upon our inputs being sorted.

There is already precedent for this in the standard library. For example lower_bound which we will actually use in this solution. When used generally this may however require the user to sort before passing.

The function we will write to do this will take in iterators to the sorted collection which the next combination is to be drawn from, and iterators to the current combination. We'd also need the size but that can be derived from the distance between the combination's iterators.

template <typename InputIt, typename OutputIt>
bool next_combination(InputIt inFirst, InputIt inLast, OutputIt outFirst, OutputIt outLast) {
    assert(distance(inFirst, inLast) >= distance(outFirst, outLast));

    const auto front = make_reverse_iterator(outFirst);
    const auto back = make_reverse_iterator(outLast);
    auto it = mismatch(back, front, make_reverse_iterator(inLast)).first;

    const auto result = it != front;

    if (result) {
        auto ub = upper_bound(inFirst, inLast, *it);

        copy(ub, next(ub, distance(back, it) + 1), next(it).base());
    }
    return result;
}

This function is written in the format of the other algorithm functions, so any container that supports bidirectional iterators can be used with it. For our example though we'll use: const vector<unsigned int> row{ 40U, 40U, 40U, 50U, 50U, 60U, 100U }; which is, necessarily, sorted:

vector<unsigned int> it{ row.cbegin(), next(row.cbegin(), 3) };

do {
    copy(it.cbegin(), it.cend(), ostream_iterator<unsigned int>(cout, " "));
    cout << endl;
} while(next_combination(row.cbegin(), row.cend(), it.begin(), it.end()));

Live Example

After writing this answer I've done a bit more research and found N2639 which proposes a standardized next_combination, which was:

Actively under consideration for a future TR, when work on TR2 was deferred pending

Viewed positively at the time

Due at least one more revision before any adoption

Needed some reworking to reflect the addition of C++11 language facilities

[Source]

Using N2639's reference implementation requires mutability, so we'll use: vector<unsigned int> row{ 40U, 40U, 40U, 50U, 50U, 60U, 100U };. And our example code becomes:

vector<unsigned int>::iterator it = next(row.begin(), 3);

do {
    copy(row.begin(), it, ostream_iterator<unsigned int>(cout, " "));
    cout << endl;
} while(next_combination(row.begin(), it, row.end()));

Live Example

Bob__ · Accepted Answer · 2016-02-05T12:37:10.630

1

You can do something like this (maybe avoiding the recursion):

#include <iostream>
#include <vector>
#include <algorithm>

using std::cout;
using std::vector;

void perm( const vector<int> &v, vector<vector<int>> &p, vector<int> &t, int k, int d) {
    for ( int i = k; i < v.size(); ++i ) {
        // skip the repeted value
        if ( i != k && v[i] == v[i-1]) continue;
        t.push_back(v[i]);
        if ( d > 0 ) perm(v,p,t,i+1,d-1);
        else p.push_back(t);
        t.pop_back();
    }
}

int main() {
    int r = 3;
    vector<int> row {40, 40, 40, 50, 50, 60, 100};
    vector<vector<int>> pp;
    vector<int> pe;

    std::sort(row.begin(),row.end()); // that's necessary
    perm(row,pp,pe,0,r-1);

    cout << pp.size() << '\n';
    for ( auto & v : pp ) {
        for ( int i : v ) {
            cout << ' ' << i;
        }
        cout << '\n';
    }
    return 0;
}

Which outputs:

I know, it's far from efficient, but if you get the idea you may come out with a better implementation.

edited Feb 05 '16 at 12:37

answered Feb 04 '16 at 20:12

Bob__

12,361
3
28
42

As specified in the question, the goal is *not* to generate all permutations. That's exactly what this does, using a `set` is a nice touch in that it saves us the memory required to construct all these, but it does nothing to avoid the processing time required. – Jonathan Mee Feb 04 '16 at 20:43
@JonathanMee I added another algorithm. See if it can help you finding something usefull. – Bob__ Feb 04 '16 at 22:29
Giving a +1 on the recursive solution. The original answer is obviously not helpful. I put in [my own solution](http://stackoverflow.com/a/35215540/2642059) but I believe that for a case when all combinations are required yours is superior. So I'd appreciate if you could edit away the incorrect answer. – Jonathan Mee Feb 05 '16 at 02:57

davidhigh · Answer 3 · 2016-02-05T00:07:12.030

Here is a class I once wrote in my university times to handle bosons. It's quite long, but it's generally usable and seems to work well. Additionally, it also gives ranking and unranking functionality. Hope that helps -- but don't ever ask me what I was doing back then ... ;-)

struct SymmetricIndex
{
    using StateType = std::vector<int>;
    using IntegerType = int;
    int M;
    int N;
    StateType Nmax;
    StateType Nmin;
    IntegerType _size;
    std::vector<IntegerType> store;
    StateType state;
    IntegerType _index;

    SymmetricIndex() = default;
    SymmetricIndex(int _M, int _N, int _Nmax = std::numeric_limits<int>::max(), int _Nmin = 0)
        : SymmetricIndex(_M, _N, std::vector<int>(_M + 1, std::min(_Nmax, _N)), StateType(_M + 1, std::max(_Nmin, 0)))
    {}

    SymmetricIndex(int _M, int _N, StateType const& _Nmax, StateType const& _Nmin)
        : N(_N)
        , M(_M)
        , Nmax(_Nmax)
        , Nmin(_Nmin)
        , store(addressArray())
        , state(M)
        , _index(0)
    {
        reset();
        _size = W(M, N);
    }

    friend std::ostream& operator<<(std::ostream& os, SymmetricIndex const& sym);

    SymmetricIndex& reset()
    {
        return setBegin();
    }


    bool setBegin(StateType& state, StateType const& Nmax, StateType const& Nmin) const
    {
        int n = N;
        for (int i = 0; i<M; ++i)
        {
            state[i] = Nmin[i];
            n -= Nmin[i];
        }

        for (int i = 0; i<M; ++i)
        {
            state[i] = std::min(n + Nmin[i], Nmax[i]);
            n -= Nmax[i] - Nmin[i];
            if (n <= 0)
                break;
        }

        return true;
    }


    SymmetricIndex& setBegin()
    {
        setBegin(state, Nmax, Nmin);
        _index = 0;
        return *this;
    }


    bool isBegin() const
    {
        return _index==0;
    }

    bool setEnd(StateType& state, StateType const& Nmax, StateType const& Nmin) const
    {
        int n = N;
        for (int i = 0; i < M; ++i)
        {
            state[i] = Nmin[i];
            n -= Nmin[i];
        }

        for (int i = M - 1; i >= 0; --i)
        {
            state[i] = std::min(n + Nmin[i], Nmax[i]);
            n -= Nmax[i] - Nmin[i];
            if (n <= 0)
                break;
        }
        return true;
    }
    SymmetricIndex& setEnd()
    {
        setEnd(state, Nmax, Nmin);
        _index = _size - 1;
        return *this;
    }

    bool isEnd() const
    {
        return _index == _size-1;
    }

    IntegerType index() const
    {
        return _index;
    }

    IntegerType rank(StateType const& state) const
    {
        IntegerType ret = 0;
        int n = 0;

        for (int i = 0; i < M; ++i)
        {
            n += state[i];
            for (int k = Nmin[i]; k < state[i]; ++k)
                ret += store[(n - k) * M + i];
        }

        return ret;
    }

    IntegerType rank() const
    {
        return rank(state);
    }

    StateType unrank(IntegerType rank) const
    {
        StateType ret(M);

        int n = N;
        for (int i = M-1; i >= 0; --i)
        {
            int ad = 0;
            int k = std::min(Nmax[i] - 1, n);

            for (int j = Nmin[i]; j <= k; ++j)
                ad+=store[(n - j) * M + i];

            while (ad > rank && k >= Nmin[i])
            {
                ad -= store[(n - k) * M + i];
                --k;
            }

            rank -= ad;
            ret[i] = k+1;
            n -= ret[i];
            if (n <= 0)
            {
                return ret;
            }
        }

        return ret;
    }

    IntegerType size() const
    {
        return _size;
    }

    operator StateType& ()
    {
        return state;
    }

    auto operator[](int i) -> StateType::value_type& { return state[i]; }

    operator StateType const& () const
    {
        return state;
    }
    auto operator[](int i) const -> StateType::value_type const& { return state[i]; }


    bool nextState(StateType& state, StateType const& Nmax, StateType const& Nmin) const
    {
        //co-lexicographical ordering with Nmin and Nmax:

        // (1) find first position which can be decreased
        //     then we have state[k] = Nmin[k]  for k in [0,pos]
        int pos = M - 1;
        for (int k = 0; k < M - 1; ++k)
        {
            if (state[k] > Nmin[k])
            {
                pos = k;
                break;
            }
        }

        // if nothing found to decrease, return
        if (pos == M - 1)
        {
            return false;
        }

        // (2) find first position after pos which can be increased
        //     then we have state[k] = Nmin[k]  for k in [0,pos]
        int next = 0;
        for (int k = pos + 1; k < M; ++k)
        {
            if (state[k] < Nmax[k])
            {
                next = k;
                break;
            }
        }
        if (next == 0)
        {
            return false;
        }
        --state[pos];
        ++state[next];

        // (3) get occupation in [pos,next-1] and set to Nmin[k]
        int n = 0;
        for (int k = pos; k < next; ++k)
        {
            n += state[k] - Nmin[k];
            state[k] = Nmin[k];
        }

        // (4) fill up from the start
        for (int i = 0; i<M; ++i)
        {
            if (n <= 0)
                break;
            int add = std::min(n, Nmax[i] - state[i]);
            state[i] += add;
            n -= add;
        }

        return true;
    }


    SymmetricIndex& operator++()
    {
        bool inc = nextState(state, Nmax, Nmin);
        if (inc) ++_index;
        return *this;
    }
    SymmetricIndex operator++(int)
    {
        auto ret = *this;
        this->operator++();
        return ret;
    }

    bool previousState(StateType& state, StateType const& Nmax, StateType const& Nmin) const
    {
        ////co-lexicographical ordering with Nmin and Nmax:

        // (1) find first position which can be increased
        //     then we have state[k] = Nmax[k]  for k in [0,pos-1]
        int pos = M - 1;
        for (int k = 0; k < M - 1; ++k)
        {
            if (state[k] < Nmax[k])
            {
                pos = k;
                break;
            }
        }

        // if nothing found to increase, return
        if (pos == M - 1)
        {
            return false;
        }

        // (2) find first position after pos which can be decreased
        //     then we have state[k] = Nmin[k]  for k in [pos+1,next]
        int next = 0;
        for (int k = pos + 1; k < M; ++k)
        {
            if (state[k] > Nmin[k])
            {
                next = k;
                break;
            }
        }
        if (next == 0)
        {
            return false;
        }
        ++state[pos];
        --state[next];

        int n = 0;
        for (int k = 0; k <= pos; ++k)
        {
            n += state[k] - Nmin[k];
            state[k] = Nmin[k];
        }
        if (n == 0)
        {
            return true;
        }

        for (int i = next-1; i>=0; --i)
        {
            int add = std::min(n, Nmax[i] - state[i]);
            state[i] += add;
            n -= add;
            if (n <= 0)
                break;
        }

        return true;
    }


    SymmetricIndex operator--()
    {
        bool dec = previousState(state, Nmax, Nmin);
        if (dec) --_index;
        return *this;
    }
    SymmetricIndex operator--(int)
    {
        auto ret = *this;
        this->operator--();
        return ret;
    }

    int multinomial() const
    {
        auto v = const_cast<std::remove_reference<decltype(state)>::type&>(state);
        return multinomial(v);
    }

    int multinomial(StateType& state) const
    {
        int ret = 1;
        int n = state[0];
        for (int i = 1; i < M; ++i)
        {
            n += state[i];
            ret *= binomial(n, state[i]);
        }
        return ret;
    }

    SymmetricIndex& random(StateType const& _Nmin)
    {
        static std::mt19937 rng;

        state = _Nmin;
        int n = std::accumulate(std::begin(state), std::end(state), 0);
        auto weight = [&](int i) { return state[i] < Nmax[i] ? 1 : 0; };

        for (int i = n; i < N; ++i)
        {
            std::discrete_distribution<int> d(N, 0, N, weight);
            ++state[d(rng)];
        }
        _index = rank();

        return *this;
    }
    SymmetricIndex& random()
    {
        return random(Nmin);
    }

private:
    IntegerType W(int m, int n) const
    {
        if (m < 0 || n < 0) return 0;
        else if (m == 0 && n == 0) return 1;
        else if (m == 0 && n > 0) return 0;
        //else if (m > 0 && n < Nmin[m-1]) return 0;
        else
        {
            //static std::map<std::tuple<int, int>, IntegerType> memo;

            //auto it = memo.find(std::make_tuple(k, m));
            //if (it != std::end(memo))
            //{
            //  return it->second;
            //}

            IntegerType ret = 0;
            for (int i = Nmin[m-1]; i <= std::min(Nmax[m-1], n); ++i)
                ret += W(m - 1, n - i);

            //memo[std::make_tuple(k, m)] = ret;
            return ret;
        }
    }

    IntegerType binomial(int m, int n) const
    {
        static std::vector<int> store;
        if (store.empty())
        {
            std::function<IntegerType(int, int)> bin = [](int n, int k)
            {
                int res = 1;

                if (k > n - k)
                    k = n - k;

                for (int i = 0; i < k; ++i)
                {
                    res *= (n - i);
                    res /= (i + 1);
                }

                return res;
            };

            store.resize(M*M);
            for (int i = 0; i < M; ++i)
            {
                for (int j = 0; j < M; ++j)
                {
                    store[i*M + j] = bin(i, j);
                }
            }
        }
        return store[m*M + n];
    }

    auto addressArray() const -> std::vector<int>
    {
        std::vector<int> ret((N + 1) * M);

        for (int n = 0; n <= N; ++n)
        {
            for (int m = 0; m < M; ++m)
            {
                ret[n*M + m] = W(m, n);
            }
        }
        return ret;
    }
};

std::ostream& operator<<(std::ostream& os, SymmetricIndex const& sym)
{
    for (auto const& i : sym.state)
    {
        os << i << " ";
    }
    return os;
}

Use it like

int main()
{
    int M=4;
    int N=3;
    std::vector<int> Nmax(M, N);
    std::vector<int> Nmin(M, 0);
    Nmax[0]=3;
    Nmax[1]=2;
    Nmax[2]=1;
    Nmax[3]=1;

    SymmetricIndex sym(M, N, Nmax, Nmin);

    while(!sym.isEnd())
    {
        std::cout<<sym<<"   "<<sym.rank()<<std::endl;
        ++sym;
    }
    std::cout<<sym<<"   "<<sym.rank()<<std::endl;
}

This will output

3 0 0 0    0    (corresponds to {40,40,40})
2 1 0 0    1    (-> {40,40,50})
1 2 0 0    2    (-> {40,50,50})
2 0 1 0    3    ...
1 1 1 0    4
0 2 1 0    5
2 0 0 1    6
1 1 0 1    7
0 2 0 1    8
1 0 1 1    9
0 1 1 1    10   (-> {50,60,100})

DEMO

Note that I assumed here an ascending mapping of your set elements (i.e. the number 40's is given by index 0, the number of 50's by index 1, and so on).

More precisely: Turn your list into a map<std::vector<int>, int> like

std::vector<int> v{40,40,40,50,50,60,100};

std::map<int, int> m;

for(auto i : v)
{
    ++m[i];
}

Then use

int N = 3;
int M = m.size();
std::vector<int> Nmin(M,0);
std::vector<int> Nmax;
std::vector<int> val;

for(auto i : m)
{
    Nmax.push_back(m.second);
    val.push_back(m.first);
}

SymmetricIndex sym(M, N, Nmax, Nmin);

as input to the SymmetricIndex class.

To print the output, use

    while(!sym.isEnd())
    {
         for(int i=0; i<M; ++i)
         {
              for(int j = 0; j<sym[i]; ++j)
              {
                  std::cout<<val[i]<<" ";
              }
         }
         std::cout<<std::endl;
    }

    for(int i=0; i<M; ++i)
    {
        for(int j = 0; j<sym[i]; ++j)
        {
            std::cout<<val[i]<<" ";
        }
    }
    std::cout<<std::endl;

All untested, but it should give the idea.

This is not only tldr, but right in the output the 2nd and 3rd lines are repetitions. — Jonathan Mee, Feb 04 '16 at 20:46
@JonathanMee: read it or not, but there are no repetitions. `2 1 0 0` corresponds to `{40,40,50}`, whereas `1 2 0 0` to `{40,50,50}`, and so on. Further, note that it is so long because it can do more than just list the combinations. It can rank and unrank them -- and, for example, thus also generate a random element. Those are features one often needs ... further it takes time only linear in the number of combinations. It actually seems to me that is what you wanted ... — davidhigh, Feb 04 '16 at 20:57
Ah, thank you. I misunderstood what I was looking at. perhaps if this was pared down it would be viable. If my work falls through I will come back and have a look. — Jonathan Mee, Feb 04 '16 at 21:03
Yours is a viable solution and a good one to keep in the back pocket, but it also over complicates what should be a simple solution: http://stackoverflow.com/a/35215540/2642059 — Jonathan Mee, Feb 05 '16 at 03:01
@Jonathan Mee: I agree, your solution is much more concise and should of course be preferred for the task you were asking for. My code -- or better an improved version of it -- should be used only to perform more general tasks like ranking, unranking, construct a random combination, or to answer questions with more constraints -- for example, give only those combinations which have at least one 40 and one 50. — davidhigh, Feb 05 '16 at 12:20

Combination of a Collection with Repetitions

3 Answers3

Linked