0

I know that itertools has a method for generating combinations, as described here: Get unique combinations of elements from a python list. However, I'm looking for an iterator that gives unique combinations and their multiplicities.

Example: I have an expression that only depends on which combination of 2 elements I select from a list L = [2,1,2,2]. I need to sum the result for all combinations. What I want is an iterator that gives e.g. (([1,2], 3), ([2,2], 3)). That way, I can compute the expression for just the 2 unique combinations and multiply by 3, rather than computing for all 6 combinations, of which many give the same result.

Community
  • 1
  • 1
bjarkemoensted
  • 2,557
  • 3
  • 24
  • 37
  • It would be really helpful if someone could explain why this gets downvoted. This is my first question here and I made an effort to check for existing answers, mention related but insufficient answers as the guidelines state and write it clearly. – bjarkemoensted Mar 03 '16 at 21:01
  • 1
    If you've searched SO and the interwebz for your answer and still not found anything relevant, your next step is to try and implement an iterator yourself that does what you want. If you present your question that shows you searched and found nothing, and have attempted it yourself (and your question shows your code), you'll likely get a better response. – PeterT Mar 03 '16 at 22:28

2 Answers2

2

You can combine itertools.combinations with collections.Counter.

import itertools
import collections  

L =  [2,1,2,2]
c = collections.Counter()
c.update(map(tuple, map(sorted, itertools.combinations(L, 2))))

c.items() then gives:

>>> c.items()
[((1, 2), 3), ((2, 2), 3)]

To break it down, itertools.combinations(L, 2) gives all the ordered combinations of L of length 2. We then use sorted to make them comparable since collections.Counter will use hashing and equality to count. Finally, because list objects are not hashable, we convert them to tuple objects which are.

Jake Cobb
  • 1,811
  • 14
  • 27
  • No idea why people are downvoting. You could use a little formatting and typo correction but it otherwise seems fine. Updated the answer for unordered, now we use `sorted` to make them comparable. – Jake Cobb Mar 03 '16 at 21:13
  • Thanks - it's really elegant and knocked my runtime down by at least a factor of 10. – bjarkemoensted Mar 03 '16 at 21:56
0

In the end, my code took too long to explicitly count every possible combination, so I came up with a way to find only the unique ones and then analytically compute their multiplicities. It's based on the following idea: Call the input list A and the number of elements in each subset k. First sort the list and initialize k pointers to the first k elements of A. Then repeatedly attempt to move the rightmost pointer to the right until it encounters a new value. Every time another pointer than the rightmost is moved, all pointers to its right are set to its neighbors, e.g. if pointer 1 is moved to index 6, pointer 2 is moved to index 7 and so on.

The multiplicity of any combination C can be found by multiplying the binomial coefficients (N_i, m_i) where N_i and m_i are the number of occurrences of element i in A and C, respectively.

Below is an implementation of a brute force approach, and a method which exploits uniqueness.

This figure compares the runtime of brute force counting vs. my approach. Counting becomes infeasible when the input list has about 20 elements. Runtime comparison

# -*- coding: utf-8 -*-
from __future__ import division

from itertools import combinations
from collections import Counter
from operator import mul
import numpy as np
from scipy.special import binom

def brute(A, k):
    '''This works, but counts every combination.'''
    A_sorted = sorted(A)
    d = {}
    for comb in combinations(A_sorted, k):
        try:
            d[comb] += 1
        except KeyError:
            d[comb] = 1
        #
    return d


def get_unique_unordered_combinations(A, k):
        '''Returns all unique unordered subsets with size k of input array.'''
    # If we're picking zero elements, we can only do it in one way. Duh.
    if k < 0:
        raise ValueError("k must be non-negative")

    if k == 0 or k > len(A):
        yield ()
        return  # Done. There's only one way to select zero elements :)

    # Sorted version of input list
    A = np.array(sorted(A))
    # Indices of currently selected combination
    inds = range(k)
    # Pointer to the index we're currently trying to increment
    lastptr = len(inds) - 1

    # Construct list of indices of next element of A different from current.
    # e.g. [1,1,1,2,2,7] -> [3,3,3,5,5,6] (6 falls off list)
    skipper = [len(A) for a in A]
    prevind = 0
    for i in xrange(1, len(A)):
        if A[i] != A[prevind]:
            for j in xrange(prevind, i):
                skipper[j] = i
            prevind = i
        #

    while True:
        # Yield current combination from current indices
        comb = tuple(A[inds])
        yield comb

        # Try attempt to change indices, starting with rightmost index
        for p in xrange(lastptr, -1 , -1):
            nextind = skipper[inds[p]]
            #print "Trying to increment index %d to %d"  % (inds[p], nextind)
            if nextind + (lastptr - p) >= len(A):
                continue  # No room to move this pointer. Try the next
            #print "great success"
            for i in xrange(lastptr-p+1):
                inds[p+i] = nextind + i
            break
        else:
            # We've exhausted all possibilities, so there are no combs left
            return
bjarkemoensted
  • 2,557
  • 3
  • 24
  • 37