6

I want to find efficiently permutations of a vector which has tied values.

E.g., if perm_vector = [0,0,1,2] I would want to obtain as output all combinations of [0,0,1,2], [0,0,2,1], [0,1,2,0] and so on, but I don't want to obtain [0,0,1,2] twice which is what the standard itertools.permutations(perm_vector) would give.

I tried the following but it works really SLOW when perm_vector grows in len:

vectors_list = []
for it in itertools.permutations(perm_vector):
    vectors_list.append(list(it))
df_vectors_list  = pd.DataFrame( vectors_list)
df_gb = df_vectors_list.groupby(list(df_vectors_list.columns)) 
vectors_list = pd.DataFrame(df_gb.groups.keys()).T

The question is of more general "speed-up" nature, actually. The main time is spent on creating the permutations of long vectors - even without the duplicity, creation of permutations of a vector of 12 unique values takes a "infinity". Is there a possibility to call the itertools iteratively without accessing the entire permutations data but working on bunches of it?

user3861925
  • 713
  • 2
  • 10
  • 24
  • 4
    Possible duplicate of [Why does Python's itertools.permutations contain duplicates? (When the original list has duplicates)](http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original) – Peter Wood Jan 11 '16 at 14:38
  • Here's an external [link](http://blog.bjrn.se/2008/04/lexicographic-permutations-using.html) from a comment in the thread referenced by the above comment that could be helpful. – Praveen Jan 11 '16 at 16:12
  • 3
    there is a recipe for this in the itertools module, check the unique_everseen recipe: https://docs.python.org/3/library/itertools.html#itertools-recipes – Copperfield Jan 11 '16 at 17:44
  • Something based on the idea of C++'s `std::next_permutation` may be appropriate; `std::next_permutation` handles duplicates the way you want. I recommend implementing it yourself at least once as a learning experience, but there are also [existing implementations available](http://matteolandi.blogspot.com/2009/09/python-nextpermutation_13.html). – user2357112 Mar 22 '17 at 17:09

2 Answers2

1

Try this if perm_vector is small:

import itertools as iter
{x for x in iter.permutations(perm_vector)}

This should give you unique values, because now it becomes a set, which by default delete duplications.

If perm_vector is large, you might want to try backtracking:

def permu(L, left, right, cache):
    for i in range(left, right):
        L[left], L[i] = L[i], L[left]
        L_tuple = tuple(L)
        if L_tuple not in cache:                
            permu(L, left + 1, right, cache)
            L[left], L[i] = L[i], L[left]
            cache[L_tuple] = 0
cache = {}
permu(perm_vector, 0, len(perm_vector), cache)
cache.keys()
Rose
  • 34
  • 2
  • While this technically works, it still generates all duplicate permutations before filtering, so it's extremely inefficient when there are many duplicates. – user2357112 Mar 22 '17 at 17:06
  • @user2357112 That's true.. If the list is big, might need to use backtracking and memoization to make it more efficient. I posted my code above (it'd be great if there's a way to avoid "if" within the "for loop" though).. – Rose Mar 23 '17 at 02:12
0

How about this:

from collections import Counter

def starter(l):
    cnt = Counter(l)
    res = [None] * len(l)
    return worker(cnt, res, len(l) - 1)

def worker(cnt, res, n):
    if n < 0:
        yield tuple(res)
    else:
        for k in cnt.keys():
            if cnt[k] != 0:
                cnt[k] = cnt[k] - 1
                res[n] = k
                for r in worker(cnt, res, n - 1):
                    yield r
                cnt[k] = cnt[k] + 1
Roman Pekar
  • 107,110
  • 28
  • 195
  • 197