3

In the recursive function below, I've applied some memoization techniques in python to save previous calculate values in a dictionary, which in theory should have storing and retrieval of O(1). However, the execution time in about three times more when using memoization than simple recursion.

In the two pieces of the code below, what would be the possible cause for such higher execution time in the second code? Am I using dictionaries the wrong way?

Simple recursive function:

import time

def deletion_distance(str1, str2):
  if str1 is "":
    return len(str2)
  elif str2 is "":
    return len(str1)
  else:
    if str1[len(str1) - 1] == str2[len(str2) - 1]:
      return deletion_distance(str1[:-1],str2[:-1])
    else:
      return 1 + min(deletion_distance(str1, str2[:-1]),deletion_distance(str1[:-1],str2))

str1 = "dragonified"
str2 = "infinitezimal"

start = time.time()
for i in range(0,2):
    deletion_distance(str1,str2)
end = time.time()

print((end - start) / 2)

Dynamic programming using dictionaries for memoization:

import time

def deletion_distance(str1, str2):
  global aux_dict

  if str1 is "":
    return len(str2)
  elif str2 is "":
    return len(str1)
  else:
    if str1[len(str1) - 1] == str2[len(str2) - 1]:
      if "1"+str1[:-1]+"2"+str2[:-1] in aux_dict:
        return aux_dict["1"+str1[:-1]+"2"+str2[:-1]]
      else:
        aux_dict["1"+str1[:-1]+"2"+str2[:-1]] = deletion_distance(str1[:-1],str2[:-1])
        return aux_dict["1"+str1[:-1]+"2"+str2[:-1]]
    else:

      return 1 + min(
        aux_dict["1"+str1+"2"+str2[:-1]] if "1"+str1+"2"+str2[:-1] in aux_dict else deletion_distance(str1, str2[:-1]),
        aux_dict["1"+str1[:-1]+"2"+str2] if "1"+str1[:-1]+"2"+str2 in aux_dict else deletion_distance(str1[:-1],str2))

aux_dict = {}      
str1 = "dragonified"
str2 = "infinitezimal"

start = time.time()
for i in range(0,2):
    deletion_distance(str1,str2)
end = time.time()

print((end - start) / 2)

Both functions calculate the deletion distance of two strings (PRAMP.COM question), which is simply the minimum number of character of both strings that much be removed from the two strings so that they become the same.

  • 3
    Note: if the number of misses is far larger than the number of hits, i.e. you're mostly storing in the dict than reading, then memoization may worsen rather than improve performance. – Moses Koledoye Nov 08 '17 at 20:07
  • 4
    Don't use `is` to compare strings. – Daniel Nov 08 '17 at 20:09
  • 1
    Yes, you are doing it wrong. `aux_dict` must not be a local. – Robᵩ Nov 08 '17 at 20:11
  • 2
    Is this being memoized properly? Your function calls create a new, empty `dict` each time...am I missing something? – juanpa.arrivillaga Nov 08 '17 at 20:11
  • Daniel: Why should I use is to compare strings? (Which shouldn't be the problem because I'm doing it in both version) Rob/Juanpa -> you are totally correct. I ended using it locally instead of using global. Still, it doesn't fix the problem as it's still slower. –  Nov 08 '17 at 20:15
  • 1
    @Nelsão because of [this](https://stackoverflow.com/questions/2987958/how-is-the-is-keyword-implemented-in-python) – yinnonsanders Nov 08 '17 at 20:16
  • 1
    @Nelsão because `is` compares *identity*, and you want equality, you should use `==`. It doesn't address this issue, but it is a bug waiting to happen. – juanpa.arrivillaga Nov 08 '17 at 20:16
  • Thanks for the explanation. Seems like the opposite of java, in which == is the memory location while str.is() is the content of the string. –  Nov 08 '17 at 20:24

1 Answers1

2

You don't use the dictionary at all, because you use a new empty dictionary for each function call.

aux_dict = {}

def deletion_distance(str1, str2):
    if (str1, str2) in aux_dict:
        return aux_dict[str1, str2]
    if not str1:
        return len(str2)
    elif not str2:
        return len(str1)
    elif str1[-1] == str2[-1]:
        result = deletion_distance(str1[:-1],str2[:-1])
    else:
        result = 1 + min(
            deletion_distance(str1, str2[:-1]),
            deletion_distance(str1[:-1], str2),
        )
    aux_dict[str1, str2] = result
    return result

which reduces the amount of calls for the two example strings from 1685178 to 268, and therefore the cached version is 3000 times faster.

EDIT: in your updated question, you still don't use the dictionary correctly. Only in the case, that the last character of both strings are equal, the dictionary is used.

Daniel
  • 42,087
  • 4
  • 55
  • 81
  • True. I did fix it on my code but the second version is still slower. Fixing it in the question. –  Nov 08 '17 at 20:19
  • After making the change, it improved a bit but is still about 2x slower than the regular recursion (used a loop of 100000 for each version). –  Nov 08 '17 at 20:31