1

I wrote code to solve the following problem, however it fails on the last two test cases. My logic used to solve the problem seems sound, and even after having a coworker review it, we both cannot figure out why it would work for the first 8 test cases but not the last two (which were randomly generated).

Problem:

Given a string, return the position of the inputted string in a alphabetically sorted list of all the possible permutations of the characters in that string. For example, the permutations of ABAB are [AABB,ABAB,ABBA,BAAB,BABA,BBAA] where the position of ABAB in that list is 2.

Logic I used to solve the problem:

For larger inputs, it is impossible (inefficient) to generate a list of permutations, so the point is to find the position without generating the alphabetical list. This can be done by finding the frequency of the characters. For the above example, the first charcater in ABAB is A, so before = 0 and after = .5, and between = 6, so you decrease max by .5*6 which is 3, for a minn of 1 and maxx of 3, to leave only [AABB,ABAB,ABBA], the perms with A as the first character! Then the characters left are BAB. minn = 1 and maxx = 3, and between = 3. So for B, before would be .33 and after would be 0, so increase minn by 3*.33 for a minn of 2 and maxx of 3, which equals [ABAB,ABBA] the perms that have AB as the first two characters! Keep doing that for every character and it will find the input in the list.

My Code:

## Imports
import operator
from collections import Counter
from math import factorial
from functools import reduce
## Main function, returns list position
def listPosition(word):
    #turns string into list of numbers, A being 1, B being 2, and so on
    val = [ord(char) - 96 for char in word.lower()]
    #the result has to be between 1 and the number of permutations
    minn = 1
    maxx = npermutations(word)
    #so we just increase the min and decrease the max based on the sum of freq
    #of the characters less than and greater than each character
    for indx in range(len(word)):
            between = (maxx+1-minn)
            before,after = sumfreq(val[indx:],val[indx])
            minn = minn + int(round((between * before),0))
            maxx = maxx - int(between * after)
    return maxx #or minn, doesn't matter. they're equal at this point

## returns the number of permutations for the string (this works)
def npermutations(word):
    num = factorial(len(word))
    mults = Counter(word).values()
    den = reduce(operator.mul, (factorial(v) for v in mults), 1)
    return int(num / den)
## returns frequency as a percent for the character in the list of chars
def frequency(val,value):
    f = [val.count(i)/len(val) for i in val]
    indx = val.index(value)
    return f[indx]
#returns sum of frequencies for all chars < (before) and > (after) the said char
def sumfreq(val,value):
    before = [frequency(val,i) for i in [i for i in set(val) if i < value]]
    after = [frequency(val,i) for i in [i for i in set(val) if i > value]]
    return sum(before),sum(after)

tests= ['A','ABAB','AAAB','BAAA','QUESTION','BOOKKEEPER','ABCABC','IMMUNOELECTROPHORETICALLY','ERATXOVFEXRCVW','GIZVEMHQWRLTBGESTZAHMHFBL']
print(listPosition(tests[0]),"should equal 1")
print(listPosition(tests[1]),"should equal 2")
print(listPosition(tests[2]),"should equal 1")
print(listPosition(tests[3]),"should equal 4")
print(listPosition(tests[4]),"should equal 24572")
print(listPosition(tests[5]),"should equal 10743")
print(listPosition(tests[6]),"should equal 13")
print(listPosition(tests[7]),"should equal 718393983731145698173")
print(listPosition(tests[8]),"should equal 1083087583") #off by one digit?
print(listPosition(tests[9]),"should equal 5587060423395426613071") #off by a lot?
Cyber17C
  • 25
  • 2
  • 2
    My guess is floating-point round-off error. There are some very big numbers there. Try using integer arithmetic, taking advantage of Python's bignum support. – rici Aug 26 '18 at 04:07
  • Do you ignore duplicates? `AABB` would appear 4 times if you permute `ABAB` – AChampion Aug 26 '18 at 04:13
  • @AChampion Judging by the choice of the example word "bookkeeper", the permutations are unique permutations. – m69's been on strike for years Aug 26 '18 at 04:14
  • Confirmed this is a floating point issue: see [Is Floating Math Broken](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) – AChampion Aug 26 '18 at 04:30

2 Answers2

3

You can use logic that only requires integer arithmetic. First, create the lexicographically first permutation:

BOOKKEEPER  ->  BEEEKKOOPR

Then, for each letter, you can calculate how many unique permutations it took to move it to its place. Since the first letter B is already in place, we can ignore it and look at the rest of the letters:

B EEEKKOOPR  (first)
B OOKKEEPER  (target)

To know how many permutations it takes to bring the O to the front, we calculate how many unique permutations there are with the E in front, then with the K in front:

E+EEKKOOPR -> 8! / (2! * 2! * 2!) = 40320 /  8 = 5040
K+EEEKOOPR -> 8! / (3! * 2!)      = 40320 / 12 = 3360

Where 8 is the number of letters to be permuted, and 2 and 3 are the number of multiples of the letters. So after 8400 permutations we are at:

BO EEEKKOPR

Now, again, we calculate how many permutations it takes to bring the second O to the front:

E+EEKKOPR -> 7! / (2! * 2!) = 5040 / 4 = 1260
K+EEEKOPR -> 7! / (3!)      = 5040 / 6 =  840

So after 10500 permutations we are at:

BOO EEEKKPR

Then we calculate how many permutations it takes to bring the K to the front:

E+EEKKPR -> 6! / (2! * 2!) = 720 / 4 = 180

So after 10680 permutations we are at:

BOOK EEEKPR

Then we calculate how many permutations it takes to bring the second K to the front:

E+EEKPR -> 5! / 2! = 120 / 2 = 60

So after 10740 permutations we are at:

BOOKK EEEPR

The next two letters are already in place, so we can skip to:

BOOKKEE EPR

Then we calculate how many permutations it takes to get the P to the front:

E+PR -> 2! = 2

So after 10742 permutations we are at:

BOOKKEEP ER

And the last two letters are also already in order, so the answer is 10743 (add 1 because the 1-based index is asked).

2

As @rici points out this is a floating point error (see Is floating point math broken?). Fortunately python has fractions.

A couple of judicious uses of fractions.Fraction fixes the issue without changing the body of the code, e.g.:

from fractions import Fraction
...
## returns the number of permutations for the string (this works)
def npermutations(word):
    num = factorial(len(word))
    mults = Counter(word).values()
    den = reduce(operator.mul, (factorial(v) for v in mults), 1)
    return int(Fraction(num, den))
## returns frequency as a percent for the character in the list of chars
def frequency(val,value):
    f = [Fraction(val.count(i),len(val)) for i in val]
    indx = val.index(value)
    return f[indx]
...

In []:
print(listPosition(tests[0]),"should equal 1")
print(listPosition(tests[1]),"should equal 2")
print(listPosition(tests[2]),"should equal 1")
print(listPosition(tests[3]),"should equal 4")
print(listPosition(tests[4]),"should equal 24572")
print(listPosition(tests[5]),"should equal 10743")
print(listPosition(tests[6]),"should equal 13")
print(listPosition(tests[7]),"should equal 718393983731145698173")
print(listPosition(tests[8]),"should equal 1083087583")
print(listPosition(tests[9]),"should equal 5587060423395426613071")

Out[]:
1 should equal 1
2 should equal 2
1 should equal 1
4 should equal 4
24572 should equal 24572
10743 should equal 10743
13 should equal 13
718393983731145698173 should equal 718393983731145698173
1083087583 should equal 1083087583
5587060423395426613071 should equal 5587060423395426613071

Updated

Based on @m69's excellent explanation here's a much simpler implementation:

from math import factorial
from collections import Counter
from functools import reduce
from operator import mul

def position(word):
    charset = Counter(word)
    pos = 1    # Per OP 1 index
    for letter in word:
        chars = sorted(charset)
        for char in chars[:chars.index(letter)]:
            ns = Counter(charset) - Counter([char])
            pos += factorial(sum(ns.values())) // reduce(mul, map(factorial, ns.values()))
        charset -= Counter([letter])
    return pos

Which gives the same results as above:

In []:
tests = ['A', 'ABAB', 'AAAB', 'BAAA', 'QUESTION', 'BOOKKEEPER', 'ABCABC',
         'IMMUNOELECTROPHORETICALLY', 'ERATXOVFEXRCVW', 'GIZVEMHQWRLTBGESTZAHMHFBL']
print(position(tests[0]),"should equal 1")
print(position(tests[1]),"should equal 2")
print(position(tests[2]),"should equal 1")
print(position(tests[3]),"should equal 4")
print(position(tests[4]),"should equal 24572")
print(position(tests[5]),"should equal 10743")
print(position(tests[6]),"should equal 13")
print(position(tests[7]),"should equal 718393983731145698173")
print(position(tests[8]),"should equal 1083087583")
print(position(tests[9]),"should equal 5587060423395426613071")

Out[]:
1 should equal 1
2 should equal 2
1 should equal 1
4 should equal 4
24572 should equal 24572
10743 should equal 10743
13 should equal 13
718393983731145698173 should equal 718393983731145698173
1083087583 should equal 1083087583
5587060423395426613071 should equal 5587060423395426613071
AChampion
  • 29,683
  • 4
  • 59
  • 75
  • 1
    I figured it had something to do with floating point numbers, but my solution was to use rounding which didn't help. I never knew fractions existed in Python! This definitely has helped me understand where it went wrong. – Cyber17C Aug 26 '18 at 11:49