5

I'm trying to generate all the possible permutations of a string like "0000011111" or "000 11 2222 333". I tried using permutations from itertools on "0000011111" like so:

from itertools import permutations

basestring = "0"*5 +"1"*5 
perms = [''.join(p) for p in permutations(basestring)]
print(len(perms), perms)
print(len(set(perms)), set(perms))

But the list perms had 3 million entries when there are only 10 C 5 = 252 permutations.

Is there a built in tool I can use that is better at handling permutations of strings with many repeated characters?


Otherwise how is this algorithm for generating the permutations (for "0000 1111 222")?

Start with 2 characters        "0000 1111"
Move right most 0 over one     "0001 0111" and add it to the list
Continue moving it to the end  "0001 1011" -> "0001 1101" -> "0001 1110"

Now move the next 0 over one   "0010 0111" -> "0010 1011"
...
Until you get to "1111 0000".

Then for each of the strings generated, repeat the process with 2's.
222 xxxx xxxx -> 22x 2xxx xxxx -> 22x x2xx xxx...

Or am I better off just doing set(perms) to get rid of the repeats? (I need to permute 20 character lists with 3-5 characters where itertools permutations would give me 10e18 strings)


I've been programming casually for 3 years, but only know as much as someone with 1 semester of a programming class.

Frank Noam
  • 111
  • 7
  • 3
    Take a look at this [question](http://stackoverflow.com/questions/6284396/permutations-with-unique-values) and see if any of the answers help you. – Rusty Shackleford Aug 04 '15 at 03:01
  • 2
    Your explanation of what you are trying to do is incoherent. You say you are looking for permutations (nPr) and then give your calculation for nCr. You use the term "string" when you might mean "list", although in Python, strings are actually lists of characters and are iterable. It appears that you have a clear idea of what you seek, try not to make us guess. – msw Aug 04 '15 at 03:54

2 Answers2

0

Firstly let's look at your first example.

from itertools import permutations
basestring = "0"*5 +"1"*5

This gives basestring = [0000011111]

Calling permutations(basestring) without any argument will give you all permutations of length n of the n position string, which is just n! That's indeed a large number for n=10. Is that really what you want?

Next, if you are looking for the permutations of this string of length 5, you need to specify that length 5 in the call to itertools.permutations.

perms = [''.join(p) for p in permutations(basestring,5)]

This will return all permutations of length 5 of all characters in basestring by position, not value. So you will get some duplicates.

As documented in the itertools.permutations documentation see Python 2 version here, the number of permutations of length r on a string of length n returned by that function will be

n!/(n-r)! or in this case 30240 for n=10, r=5.

If you want to remove the duplicates, you can take

set(perms)

The number of combinations returned by this will be len(set(perms)) = 2^5 or 32. This is the number of strings of length k that can be formed from an "alphabet" of length n, which is n^k. The "alphabet" is the unique characters in your basestring - there are 2 of them (0 and 1) so you can form 32 unique strings of length 5 from that.

paisanco
  • 4,098
  • 6
  • 27
  • 33
  • Sorry if I was unclear. I want to use all the characters. "0" * 5 + "1" * 5 was just so that I didn't have to count out five 0's and 1's. Also 10 C 5 is a trick from combinatorics you can use to for counting the number of 2 symbol strings (Choose 5 of the 10 places to be 0's and fill the rest with 1's) The problem isn't that I'm getting a few duplicates, it's that I'm getting r! duplicates. Does a function like set() work on lists with 10e18 elements? – Frank Noam Aug 04 '15 at 07:44
  • @FrankNoam, calling itertools.permutations with no argument for permutation length will give you all n-length permutations of your n-position string, or n! Is that really what you want? Can you edit the question to clarify? I attempted to address why you were not getting what you expected in your example. If that is not what you are after you should edit. – paisanco Aug 07 '15 at 01:44
0

I'm not sure how efficient this is but you could try something like:

map = ['0','1','2','3']
perms = []

def multisetPerms(s,result):
  if all(v is 0 for v in s):
    perms.append(result)
    return
  for i in xrange(len(s)):
    if s[i] > 0:
      _s = s[:]
      _s[i] = _s[i] - 1
      multisetPerms(_s,result + map[i])

multisetPerms([3,2,4],'') # 9!/(3!*2!*4!) = 1260
print (len(perms), perms[0:10]) 

Output:

(1260, ['000112222', '000121222', '000122122', '000122212', '000122221'
      , '000211222', '000212122', '000212212', '000212221', '000221122'])
גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61