-2

I am trying to develop a method for finding the orderd rank of a particular sequence in the following lists.

a = list(sorted(itertools.combinations(range(0,5),3)))
b = list(sorted(itertools.permutations(range(0,5),3)))

a represents a list of elemnts of a combinatorial number system so the formula for rank is pretty straight forward.

What I need are 2 function magic_rank_1 and magic_rank_2 which have the following definitions

def magic_rank_1(perm_item,permutation_list_object): 
## permutation_list_object is actually b
    return list_object.index(perm_item)

def magic_rank_2(perm_item,permutation_list_object,combination_list_object):
## permutation_list_object is actually b and combination_list_object is actually a
    return combination_list_object.index(tuple(sorted(perm_item)))

So basically magic_rank_2((0,1,2),b,a) = magic_rank_2((2,0,1),b,a)

Sounds easy.. but i have a few restrictions.

  • I cant use the indexof function because I cannot afford to search lists >100000000 items for every item
  • I need magic_rank_1 and magic_rank_2 to be purely mathematical without using any sort function or comparison functions or search function. All the information I will have is the tuple whose rank needs to be identified and the last letter of my alphabet (in this case that will be 5)
    • magic rank 2 need not be a number between 0 and k-1 when k = len(a) as long as it is a unique number between 0 and 2^(ceiling((max_alphabet/2)+1))

I know magic_rank_1 can be calculated by something similar to this but there is a difference ,there every letter of the input alphabet is used, in my case it is a subset

Lastly yes this is supposed to be a substitute for a hashing function, currently using a hashing function but I am not taking advantage of the fact that magic_rank_2((0,1,2),b,a) = magic_rank_2((2,0,1),b,a) . If i can it will reduce my storage space requirements significantly as the length of my sequences is actually 5 , so if I can calculate a method for magic_rank_2 I reduce my storage requirement to 1% of current requirement

UPDATE - For magic_rank_2 there should be no comparison operation between elements of the tuple, i.e no sorting, min,max etc

that only makes the algorithm less efficient than regular hashing

Siddharth Chabra
  • 448
  • 6
  • 22
  • The `indexof` function is best implemented in hashing, like you said, if you can afford storing and indexing the lists beforehand for performance. Otherwise since `itertools.combinations` and `itertools.permutations` are both generator, meaning every time you invoke them you have to spend time iterating the whole lists, and it's slow anyway. Your input in magic_rank_2 only need to be sorted once. If you need to invoke `magic_rank_2` many times, you can wrap the loop in a `Cython` function. – Patrick the Cat Jun 17 '18 at 01:10
  • As for mathematical (?) methods, you can look at the source code of `itertools` to figure out the expected order of elements in the iteration of generator. Then you can try to calculate the exact index of perm_item from the expected order. I wouldn't do it that way, since nobody knows if that wouldn't be an NP hard problem and require a Ph.D. thesis. – Patrick the Cat Jun 17 '18 at 01:14
  • I wouldn't call it an NP hard problem, the final implementation is targeted for c anyways I am using a sorting tree to sort (i.e log(n)) but since my words are 7-10 digits longs that means 13-22 comparison operation for sorting these are proving to be a bottle neck,, also I need to take advantage of the controlled collisions since rank(x,y,z) = rank(y,z,x) – Siddharth Chabra Jun 17 '18 at 01:35
  • @Mai 1)Python is used for proof of concept and prototyping so an efficient algorithm in python works for C 2) I didn't say tree sort I said sorting tree there is a BIG difference 3) I asked am here for a more efficient way of doing things. instead of pining for upvotes to your answer its better to not answer If you cant help hope there is no next time – Siddharth Chabra Jun 17 '18 at 21:17
  • I am so sorry I can't. Good luck. – Patrick the Cat Jun 18 '18 at 00:16

1 Answers1

1

The following two functions will rank a combination and permutation, given a word and an alphabet (or in your case, a tuple and a list).

import itertools
import math

def rank_comb(word, alph, depth=0):
    if not word: return 0

    if depth == 0:
        word = list(word)
        alph = sorted(alph)

    pos = 0
    for (i,c) in enumerate(alph):
        if c == word[0]:
            # Recurse
            new_word = [x for x in word if x != c]
            new_alph = [x for x in alph if x > c]
            return pos + rank_comb(new_word, new_alph, depth+1)
        else:
            num = math.factorial(len(alph)-i-1)
            den = math.factorial(len(alph)-i-len(word)) * math.factorial(len(word)-1)
            pos += num // den


def rank_perm(word, alph, depth=0):
    if not word: return 0

    if depth == 0:
        word = list(word)
        alph = sorted(alph)

    pos = 0
    for c in alph:
        if c == word[0]:
            # Recurse
            new_word = [x for x in word if x != c]
            new_alph = [x for x in alph if x != c]
            return pos + rank_perm(new_word, new_alph, depth+1)
        else:
            num = math.factorial(len(alph)-1)
            den = math.factorial(len(alph)-len(word))
            pos += num // den


#== Validation =====================================================================
# Params
def get_alph(): return range(8)
r = 6

a = list(sorted(itertools.combinations(get_alph(), r)))
b = list(sorted(itertools.permutations(get_alph(), r)))

# Tests
PASS_COMB = True
PASS_PERM = True
for (i,x) in enumerate(a):
    j = rank_comb(x, get_alph())
    if i != j:
        PASS_COMB = False
        print("rank_comb() FAIL:", i, j)

for (i,x) in enumerate(b):
    j = rank_perm(x, get_alph())
    if i != j:
        PASS_PERM = False
        print("rank_perm() FAIL:", i, j)

print("rank_comb():", "PASS" if PASS_COMB else "FAIL")
print("rank_perm():", "PASS" if PASS_PERM else "FAIL")

The functions are similar, but there are few differences:

  • new_alph is filtered differently.
  • The calculation of both num and den are different.

Update:

rank_comb2 doesn't require sorting the input word (a 3-tuple):

import itertools
import math

def rank_comb2(word, alph, depth=0):
    if not word: return 0

    if depth == 0:
        word = list(word)
        alph = sorted(alph)

    pos = 0
    for (i,c) in enumerate(alph):
        if c == min(word):
            # Recurse
            new_word = [x for x in word if x != c]
            new_alph = [x for x in alph if x > c]
            return pos + rank_comb2(new_word, new_alph, depth+1)
        else:
            num = math.factorial(len(alph)-i-1)
            den = math.factorial(len(alph)-i-len(word)) * math.factorial(len(word)-1)
            pos += num // den

r1 = rank_comb2([2,4,1], range(5))
r2 = rank_comb2([1,4,2], range(5))
r3 = rank_comb2([4,1,2], range(5))

print(r1, r2, r3)     # 7 7 7
jedwards
  • 29,432
  • 3
  • 65
  • 92
  • this only takes care of magic_rank_1 so cant mark it as answer otherwise works great – Siddharth Chabra Jun 16 '18 at 15:23
  • No problem but could you maybe expand on what `magic_rank_2` does? Or what `rank_comb(sorted(word), alph)` doesn't do that you'd want? – jedwards Jun 16 '18 at 15:33
  • I have mentioned.. I don't want to sort the word it significantly slows down performance, looking for a purely mathematical function – Siddharth Chabra Jun 16 '18 at 16:18
  • Sorting a 3 element tuple significantly slows down performance? – jedwards Jun 16 '18 at 16:19
  • Anyway, I updated the answer with an approach that doesn't require sorting the input word, although it seems *less* efficient, not more. – jedwards Jun 16 '18 at 16:25
  • ofcource it is less efficient.. you are using min(word)??? that is just a symantic workaround not a solution.. by no sorting i mean no comparison between elements of the word – Siddharth Chabra Jun 17 '18 at 00:15
  • in the example its a 3 element tuple in reality they are larger(7-10) elements – Siddharth Chabra Jun 18 '18 at 08:37
  • For your `magic_rank_2` function, what if you set the bits of an int corresponding to the the tuple elements -- so `(2,0,1) = 111 = 8` as does `(0,1,2)`. (and for example, `(1,3,5,7) = (7,1,5,3) = etc. = 10101010 = 0xAA = 170`. Would this meet your requirements? – jedwards Jun 18 '18 at 12:07
  • nice approach, but memory becomes as issue when we deal with 2^51 etc, let me give it a shot though as stated my limit is 2^(max alhabet/2) – Siddharth Chabra Jun 18 '18 at 12:11