3

I solved a puzzle but need to optimize my solution. The puzzle says that I am to take a string S, find all permutations of its characters, sort my results, and then return the one-based index of where S appears in that list.

For example, the string 'bac' appears in the 3rd position in the sorted list of its own permutations: ['abc', 'acb', 'bac', 'bca', 'cab', 'cba'].

My problem is that the puzzle limits my execution time to 500ms. One of the test cases passed "BOOKKEEPER" as an input, which takes ~4.2s for me to complete.

I took a (possibly naive) dynamic programming approach using memoization using a dict keyed by one particular permutation of some character set, but that's not enough.

What is my bottleneck?

I'm profiling in the meantime to see if I can answer my own question, but I invite those who see the problem outright to help me understand how I slowed this down.

EDIT: My solution appears to outperform itertools.permutations. 10+ seconds for input "QUESTION". But to be fair, this includes time printing so this might not be a fair comparison. Even so, I'd rather submit a handcoded solution with competitive performance knowing why mine was worse than to opt for a module.

memo = {}

def hash(word):
    return ''.join(sorted(word))

def memoize(word, perms):
    memo[hash(word)] = perms
    return perms

def permutations(word, prefix = None):
    """Return list of all possible permutatons of given characters"""
    H = hash(word)

    if H in memo:
        return [s if prefix is None else prefix + s for s in memo[H]]

    L = len(word)

    if L == 1:
        return [word] if prefix is None else [prefix + word]

    elif L == 2:
        a = word[0] + word[1]
        b = word[1] + word[0]

        memoize(word, [a, b])

        if prefix is not None:
            a = prefix + a
            b = prefix + b

        return [a, b]

    perms = []
    for i in range(len(word)):
        perms = perms + permutations(word[:i] + word[i+1:], word[i])

    memoize(word, perms)

    return [prefix + s for s in perms] if prefix is not None else perms


def listPosition(word):
  """Return the anagram list position of the word"""
  return sorted(list(set(permutations(word)))).index(word) + 1

print listPosition('AANZ')
Sage Gerard
  • 1,311
  • 8
  • 29
  • How does `itertools.permutations` perform? See the `timeit` module or IPython magic `%timeit` – Wayne Werner Sep 22 '16 at 00:12
  • @WayneWerner Oh, forgot to mention that. Way worse. 10s+ for input "QUESTION" – Sage Gerard Sep 22 '16 at 00:12
  • @WayneWerner see edit in question – Sage Gerard Sep 22 '16 at 00:17
  • 2
    I would think you can come up with a formula to calculate the position of the input word, based on the alphabetic ordering (since the list is sorted alphabetically) of the letters. If I understand the puzzle correctly, it only asks to return the position of the input, not all of the permutations. So you'll want to grab some pen and paper and find a formulation of that problem. –  Sep 22 '16 at 00:21
  • @Evert I understand and did not even consider that possibility. Since that interpretation changes the focus away from optimizing an existing algorithm I'll leave the question as is, but will reconsider my approach in the meantime. – Sage Gerard Sep 22 '16 at 00:23
  • 1. There's an [*easy way*](http://stackoverflow.com/a/4299378/23771) to see which lines of code are responsible for the time. My guess is the `perms = perms + ...` line because it's major memory operations, but that method will tell for sure. 2. Generating permutations in place is trivial, and you can use the fact that you know what the next character should be to avoid generating all of them. (If you're generating a bunch that *don't* start with the right character, you don't need to generate them, because you already know how many there will be.) – Mike Dunlavey Sep 22 '16 at 00:40

3 Answers3

3

I believe the answer is to not produce all the permutations nor sort them. Let's keep it simple and see how it compares performance-wise:

import itertools

def listPosition(string):
    seen = set()

    target = tuple(string)

    count = 1;

    for permutation in itertools.permutations(sorted(string)):
        if permutation == target:
            return count
        if permutation not in seen:
            count += 1
            seen.add(permutation)

print(listPosition('BOOKKEEPER'))

TIMINGS (in seconds)

           Sage/Evert  Mine  Sage     Answer
QUESTIONS     0.02     0.18  0.45      98559
BOOKKEEPER    0.03     0.11  2.10      10743
ZYGOTOBLAST   0.03     24.4  117(*)  9914611

(*) includes ~25 second delay between printing of answer and program completion

The output from Sci Prog's code did not produce answers that agreed with the other two as it produced larger indexes and multiple of them so I didn't include its timings which were lengthy.

cdlane
  • 40,441
  • 5
  • 32
  • 81
  • You are right, my program is incorrect: when some letters are repeated (e.g. "BOOKKEEPER" has two "O", two "K" and three "E"), my program does not remove the resulting words that occur multiple times (hence the larger indexes). However, the conclusion is the same: *do not generate all the permutations*. – Sci Prog Sep 22 '16 at 02:09
3

Providing my own answer under the assumption that a good way to optimize code is to not use it in the first place. Since I strongly emphasized identifying ways to speed up the code I posted, I'm upvoting everyone else for having made improvements in that light.

@Evert posted the following comment:

I would think you can come up with a formula to calculate the position of the input word, based on the alphabetic ordering (since the list is sorted alphabetically) of the letters. If I understand the puzzle correctly, it only asks to return the position of the input, not all of the permutations. So you'll want to grab some pen and paper and find a formulation of that problem.

Following this reasoning, among similar suggestions from others, I tried an approach based more in enumerative combinatorics:

from math import factorial as F
from operator import mul

def permutations(s):
    return F(len(s)) / reduce(mul, [F(s.count(c)) for c in set(s)], 1)

def omit(s,index):
    return s[:index] + s[index+1:]

def listPosition(s):
    if (len(s) == 1):
        return 1

    firstletter = s[0]
    predecessors = set([c for c in s[1:] if c < firstletter])
    startIndex = sum([permutations(omit(s, s.index(c))) for c in predecessors])

    return startIndex + listPosition(s[1:])

This produced correct output and passed the puzzle at high speed (performance metrics not recorded, but noticably different). Not a single string permutation was actually produced.

Take as an example input QUESTION:

We know that wherever in the list "QUESTION" appears, it will appear after all permutations that start with letters that come before "Q". The same can be said of substrings down the line.

I find the letters that come before firstletter = 'Q', which is stored in predecessors. The set prevents double counting for input with repeated letters.

Then, we assume that each letter in predecessors acts as a prefix. If I omit that prefix from the string and find the sum of permutations of the remaining letters, we find the number of permutations that must appear before the initial input's first letter. Recurse, then sum the results, and you end up with the start position.

Sage Gerard
  • 1,311
  • 8
  • 29
2

Your bottleneck resides in the fact that the number of permutations of a list of N items is N! (N factorial). This number grows very fast as the input increases.

The first optimisation you can do is that you do not have to store all the permutations. Here is a recursive solution that produces all the permutations already sorted. The "trick" is to sort the letters of the word before generating the permutations.

def permutations_sorted( list_chars ):
  if len(list_chars) == 1:  # only one permutation for a 1-character string     
    yield list_chars
  elif len(list_chars) > 1:
    list_chars.sort()
    for i in range(len(list_chars)):
      # use each character as first position (i=index)                          
      head_char = None
      tail_list = []
      for j,c in enumerate(list_chars):
        if i==j:
          head_char = c
        else:
          tail_list.append(c)
      # recursive call, find all permutations of remaining                      
      for tail_perm in permutations_sorted(tail_list):
        yield [ head_char ] + tail_perm

def puzzle( s ):
  print "puzzle %s" % s
  results = []
  for i,p_list in enumerate(permutations_sorted(list(s))):
    p_str = "".join(p_list)
    if p_str == s:
      results.append( i+1 )
  print "string %s was seen at position%s %s" % (
    s,
    "s" if len(results) > 1 else "",
    ",".join(["%d" % i for i in results])
  )
  print ""


if __name__ == '__main__':
  puzzle("ABC")       

Still, that program takes a long time to run when the input is large. On my computer (2.5 GHz Intel core i5)

  • Input = "ABC" (3 characters): 0.03 seconds
  • Input = "QUESTION" (8 characters): 0.329 seconds
  • Input = "QUESTIONS" (9 characters): 2.848 seconds
  • Input = "BOOKKEEPER" (10 characters): 30.47 seconds

The only way to "beat the clock" is to figure a way to compute the position of the string without generating all the permutations.

See the comment by Evert above.

N.B. When the input contains letters that are repeated, the initial string is seen at more than one place. I assume you have to report only the first occurence.

Sci Prog
  • 2,651
  • 1
  • 10
  • 18
  • See the comment by @cdlane. My program does not give the correct answers. However, the conclusion still holds. – Sci Prog Sep 22 '16 at 02:13