I wrote code to solve the following problem, however it fails on the last two test cases. My logic used to solve the problem seems sound, and even after having a coworker review it, we both cannot figure out why it would work for the first 8 test cases but not the last two (which were randomly generated).
Problem:
Given a string, return the position of the inputted string in a alphabetically sorted list of all the possible permutations of the characters in that string. For example, the permutations of ABAB are [AABB,ABAB,ABBA,BAAB,BABA,BBAA] where the position of ABAB in that list is 2.
Logic I used to solve the problem:
For larger inputs, it is impossible (inefficient) to generate a list of permutations, so the point is to find the position without generating the alphabetical list. This can be done by finding the frequency of the characters. For the above example, the first charcater in ABAB is A, so before = 0 and after = .5, and between = 6, so you decrease max by .5*6 which is 3, for a minn of 1 and maxx of 3, to leave only [AABB,ABAB,ABBA], the perms with A as the first character! Then the characters left are BAB. minn = 1 and maxx = 3, and between = 3. So for B, before would be .33 and after would be 0, so increase minn by 3*.33 for a minn of 2 and maxx of 3, which equals [ABAB,ABBA] the perms that have AB as the first two characters! Keep doing that for every character and it will find the input in the list.
My Code:
## Imports
import operator
from collections import Counter
from math import factorial
from functools import reduce
## Main function, returns list position
def listPosition(word):
#turns string into list of numbers, A being 1, B being 2, and so on
val = [ord(char) - 96 for char in word.lower()]
#the result has to be between 1 and the number of permutations
minn = 1
maxx = npermutations(word)
#so we just increase the min and decrease the max based on the sum of freq
#of the characters less than and greater than each character
for indx in range(len(word)):
between = (maxx+1-minn)
before,after = sumfreq(val[indx:],val[indx])
minn = minn + int(round((between * before),0))
maxx = maxx - int(between * after)
return maxx #or minn, doesn't matter. they're equal at this point
## returns the number of permutations for the string (this works)
def npermutations(word):
num = factorial(len(word))
mults = Counter(word).values()
den = reduce(operator.mul, (factorial(v) for v in mults), 1)
return int(num / den)
## returns frequency as a percent for the character in the list of chars
def frequency(val,value):
f = [val.count(i)/len(val) for i in val]
indx = val.index(value)
return f[indx]
#returns sum of frequencies for all chars < (before) and > (after) the said char
def sumfreq(val,value):
before = [frequency(val,i) for i in [i for i in set(val) if i < value]]
after = [frequency(val,i) for i in [i for i in set(val) if i > value]]
return sum(before),sum(after)
tests= ['A','ABAB','AAAB','BAAA','QUESTION','BOOKKEEPER','ABCABC','IMMUNOELECTROPHORETICALLY','ERATXOVFEXRCVW','GIZVEMHQWRLTBGESTZAHMHFBL']
print(listPosition(tests[0]),"should equal 1")
print(listPosition(tests[1]),"should equal 2")
print(listPosition(tests[2]),"should equal 1")
print(listPosition(tests[3]),"should equal 4")
print(listPosition(tests[4]),"should equal 24572")
print(listPosition(tests[5]),"should equal 10743")
print(listPosition(tests[6]),"should equal 13")
print(listPosition(tests[7]),"should equal 718393983731145698173")
print(listPosition(tests[8]),"should equal 1083087583") #off by one digit?
print(listPosition(tests[9]),"should equal 5587060423395426613071") #off by a lot?