6

I am trying to find the most efficient way to find permutation on a set of '0' and '1' given an index.

Ex: Given l = [0, 0, 1, 1]. All permutations in an ascending order is {0011, 0101, 0110, 1001, 1010, 1100}. These elements are indexed from 0 -> 5. Given index = 2, the result is 0110.

I found the algorithm here which inputs an integer multiset (e.g. l = [1, 2, 2]). His algorithm is efficient (O(N^2)). However, my multiset only consists of '0' and '1' and requires O(N) or less. N is the length of the list

Could you please kindly help me. Note that my real test is big (len(l) is 1024), so intertool library is not suitable. I am trying to speed up it as much as possible (e.g, using gmpy2...)

Based on 1, the following is my try but it is O(N^2)

from collections import Counter
from math import factorial
import gmpy2   

def permutation(l, index):
    if not index:
        return l

    counter = Counter(l)
    total_count = gmpy2.comb(len(l), counter['1'])
    acc = 0
    for i, v in enumerate(l):
        if i > 0 and v == l[i-1]:
            continue
        count = total_count * counter[v] / len(l)

        if acc + count > index:
            return [l[i]] + permutation(l[:i] + l[i + 1:], index - acc)
        acc += count

    raise ValueError("Not enough permutations")

l = ['0', '0', '1', '1']
index = 2
print (l, index)
   --> result = [0, 1, 1, 0]

Thanks in advance.

Community
  • 1
  • 1
santa
  • 193
  • 10
  • 2
    So basically, you're looking for the N'th binary number with a certain number of 1 bits? – Aaron Digulla Jul 15 '14 at 07:55
  • @AaronDigulla: what I mean is to found the index'th permuation in a set of all possible permutations. Each permutation has length n and consists of a given number of bit '1' – santa Jul 15 '14 at 08:01
  • @santa But the order matters? I mean you want index'th permutation in the ordered set of possible permutations? Or do you simply want a consistent and unique way of indexing pemutations? – freakish Jul 15 '14 at 08:11
  • @freakish: yes, all possible permutations are in an ascending order. I describe my example above: Ex: Given l = [0, 0, 1, 1]. All permutations in an ascending order is {0011, 0101, 0110, 1001, 1010, 1100}. These elements are indexed from 0 -> 5. Given index = 2, the result is 0110. – santa Jul 15 '14 at 08:16
  • @santa: That's the same as what I said. Converting the input set to a large binary number and then using a finite set of shift operations will get you what you want in O(index) time (i.e. just depending on the index you want) But I see how that doesn't help you much. Still, look at the binary patterns. Maybe you can speed up the process using precomputed results. – Aaron Digulla Jul 15 '14 at 09:45
  • I don't have the solution atm, but if the order in the input doesn't matter, you'd probably better simply write the number of 1's and the number of 0's, which is O(log(N)) and allows you to think of a better solution than O(N). – user189 Jul 15 '14 at 09:45
  • Anyway the desired output is O(N). – user189 Jul 15 '14 at 09:59
  • Do you know the maximum index in advance? – Aaron Digulla Jul 15 '14 at 10:11
  • @user189: Thank you. What I want to find is the corresponding permutation of a given index. So, the ascending order of the permutation list is necessary. – santa Jul 15 '14 at 13:17
  • @AaronDigulla: I just know the index spends from whole range from 0 to (the number of all permutations - 1). – santa Jul 15 '14 at 13:20
  • I meant in "l = [0, 0, 1, 1]", the only information is "there are 2 0's and 2 1's", you might not spot the difference for a small instance, but for a 1024 list, it will be shorter. – user189 Jul 15 '14 at 13:20
  • @santa: Well, since you always have 1024 bits in total, there can be at most 1024 different values for `len(permutations)`. So that's something you can pre-calculate (over night) and save in a table for fast lookup. – Aaron Digulla Jul 15 '14 at 13:24
  • Just an index does not seem enough to me. Don't we also need the length and the count of how many zeros and how many ones? – גלעד ברקן Jul 15 '14 at 19:11
  • Also, is "permutation" the right word? Wouldn't [1,1,1,1] have the same number of permutations as [0,0,1,1], namely 24 permutations? – גלעד ברקן Jul 15 '14 at 19:19
  • @גלעדברקן: The correct term is probably **anagram**. – pillmuncher Jul 15 '14 at 19:42
  • @pillmuncher yes, "anagram" seems to fit. – גלעד ברקן Jul 16 '14 at 00:54
  • @santa why is the result for index 2 in your code example (`1001`) different from the result for the same index in your explanation above (`0110`)? – גלעד ברקן Jul 16 '14 at 00:56
  • @גלעדברקן: Thank you, the input is the index, the number of of '1' and the number of '0'. Also, I am sorry since in my code, I copy and paste wrongly. The result should be [0, 1, 1, 0] – santa Jul 16 '14 at 03:07

4 Answers4

3

Let's think:

For n bits with k ones there are n choose k anagrams.

For each position, p, that the i`th left-most set-bit can occupy there are 
p choose (k-i) anagrams, for example:

n = 4, k = 2, i = 1 (left-most set-bit), position 1 => 001x => 1 choose 1 = 1
n = 4, k = 2, i = 1 (left-most set-bit), position 2 => 01xx => 2 choose 1 = 2

Given index 3 (non zero-based), we calculate the position of the 
left-most set-bit:

position 1, 1 choose (2-1) = 1 anagram, index 1
position 2, 2 choose (2-1) = 2 anagrams, index 2-3

We now know the left-most set-bit must be on position 2 and we know there 
are 2 anagrams possible. 

We look at the next set-bit (i = 2):
position 0, 0 choose (2-2) = 1 anagram, index 2
position 1, 1 choose (2-2) = 1 anagram, index 3

Therefore the second set-bit is in position 1 => 0110

I think this might be O(n*k) - I hope someone can understand/explain the
complexity better and perhaps improve/optimize this algorithm idea.
גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
3

Given a permutations of N 0s and M 1s, we need to find the permutation with index K

We know that the number of permutations starting with 0 is equal to the number of permutations of N-1 0s and M 1s, let's call it K0.

if K > K0 =>  The permutation starts with 1, K remains the same
if k <= K0 => The permutation starts with 0, remove K0 from K

Fix the first bit and start again with K = K - K0 and the correct number of 0s and 1s.

This algorithm runs in O(n) where n is the number of bits (and not the length of the list).

In order to simplify the calculations, we assume a 1 based index (starting at 1)

Example:

n = xxxx
l = [0, 0, 1, 1]
K = 2 => 3
Number of permutations starting with 0: K0 = 3! / (2! * 1!) = 3
K <= K0 => first bit is a 0

n = 0xxx
l = [0, 1, 1]
K = K = 3
Number of permutations starting with 0: K0 = 2! / (2! * 0!) = 1
K > K0 => first bit is a 1

n = 01xx
l = [0, 1]
K = K - K0 = 2
Number of permutations starting with 0: K0 = 1! / (1! * 0!) = 1
K > K0 => first bit is a 1

n = 011x
l = [0]
K = K - K0 = 1
Number of permutations starting with 0: K0 = 1! / (0! * 0!) = 1
K <= K0 => first bit is a 0

n = 0110 Which is verified in your example.

Implementing this algorithm can be tricky, make sure to handle correctly the case where the whole list is composed of only 0s or 1s. Also computing the factorials might take sometime (and cause overflow in other languages), but it's possible to precompute them.

Samy Arous
  • 6,794
  • 13
  • 20
  • Thanks, just noticed we had a very similar approaches :) – Samy Arous Jul 16 '14 at 02:12
  • Yes, but you helped me see a more succinct and efficient way to think about it. – גלעד ברקן Jul 16 '14 at 03:02
  • @SamyArous: Thank you. In the first loop: why 'K = 2 => 3'? – santa Jul 16 '14 at 04:53
  • @SamyArous: Furthermore, in the 2nd loop: K = 3 > K0 = 1, K should be 3 based on ur alg, but why in loop 3rd, K = K-K0 = 2? – santa Jul 16 '14 at 05:03
  • @santa, as I said, to simplify calculations, we use a base 1 indexed array whereas, you use a base 0 indexed array. Therefor, index 2 in your example becomes index 3 in the algorithm – Samy Arous Jul 16 '14 at 12:42
  • @santa, By picking 1 as left most bit, we have eliminated all permutations that starts with 0 (which is K0 = 1) and therefor, we are no longer looking for the 3rd permutation, we are looking for the 2nd permutation that has a 1 as left most bit. – Samy Arous Jul 16 '14 at 12:44
2

Some ideas how you can try to attack this problem.

Here is a simple program to print all permutations:

import sys

oneBits = int(sys.argv[1])
totalLen = int(sys.argv[2])

low = 2**oneBits-1
end = 2**totalLen

print 'oneBits:',oneBits
print 'totalLen:',totalLen
print 'Range:',low,'-',end
print
format = '{0:0%db}' % totalLen
index = 0
print 'Index Pattern Value'
for i in range(low,end):
    val = format.format(i)
    if val.count('1') == oneBits:
        print '%5d %s %5d' % (index,val,i)
        index += 1

As you can see, it works purely on bit operations (well, I'm cheating a bit when counting the 1 bits :-)

When you run it with various inputs, you'll see that the input has patterns:

oneBits: 2
totalLen: 5
Range: 3 - 32

Index Pattern Value
    0 00011     3
    1 00101     5
    2 00110     6  <-- pure shift
    3 01001     9
    4 01010    10
    5 01100    12  <-- pure shift
    6 10001    17
    7 10010    18
    8 10100    20
    9 11000    24  <-- pure shift

So my first approach would be to find out the indexes where these pure shifts happen. The distances depend only on the number of 0 and 1 bits. Since the sum is always 1024, that means you should be able to pre-calculate those spots and store the results in a table with 1024 entries. This will get you close to the spot you want to be.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • Thank you, Aaron. I have not understood your approach yet. Do u mean I need to use the code to list all permutations first? If so, when totalLen = 1024, oneBits = 100, I cannot use 'for' since 'OverflowError: range() result has too many items'; furthermore, listing all the patterns may cost time? Could you kindly please explain me more. Thank you – santa Jul 15 '14 at 13:10
  • The program should help you identify patterns in the permutations. That should give you an idea for shortcuts. One such shortcut is to identify the closest index where the start pattern is just shifted by X (i.e. a couple of `0` are removed from the start and appended at the end). These can then serve as starting points to find the real value you're looking for. – Aaron Digulla Jul 15 '14 at 13:20
  • To see these patterns, you should probably not try anything with totalLen > 10 and oneBits > 4 because you just get too many values. – Aaron Digulla Jul 15 '14 at 13:22
1

Based on the idea of Samy Arous, I change his alg a bit:

if K >= K0 => The permutation starts with 1, K = K - K0
if K < K0  => The permutation starts with 0, K remains the same

The following is my code:

import gmpy2

def find_permutation (lst, K, numberbit1, numberbit0):
    l = lst
    N = numberbit0
    M = numberbit1

    if N == len(l):
        return '1' * N
    if M == len(l):
        return '1' * M

    result = ''    
    for i in range (0, len(lst)-1):
        K0 = gmpy2.comb(len(l)-1, M)
        if (K < K0):
            result += '0'
            l.remove ('0')
        else:
            result += '1'
            l.remove ('1')
            M -=1
            K = K - K0
    result += l[0]
    return result

lst = ['0','1','1', '1']
K = 1
numberbit1 = 3
numberbit0 = 1
print find_permutation (lst, K, numberbit1, numberbit0)
        --> result = '1011'

Thank you. Though it is O(n) x (the complexity of gmpy2.comb), it is better than the alg in my question.

santa
  • 193
  • 10