Runtime error with recursion and memoization algorithm

Question

I am trying to solve https://www.hackerrank.com/challenges/password-cracker/problem problem Here is the problem statement

There are N users registered on a website CuteKittens.com. Each of them have a unique password represented by pass[1], pass[2], ..., pass[N]. As this a very lovely site, many people want to access those awesomely cute pics of the kittens. But the adamant admin doesn't want this site to be available for general public. So only those people who have passwords can access it.

Yu being an awesome hacker finds a loophole in their password verification system. A string which is concatenation of one or more passwords, in any order, is also accepted by the password verification system. Any password can appear 0 or more times in that string. He has access to each of the N passwords, and also have a string loginAttempt, he has to tell whether this string be accepted by the password verification system of the website.

For example, if there are 3 users with password {"abra", "ka", "dabra"}, then some of the valid combinations are "abra" (pass[1]), "kaabra" (pass[2]+pass[1]), "kadabraka" (pass[2]+pass[3]+pass[2]), "kadabraabra" (pass[2]+pass[3]+pass[1]) and so on.

Input Format

First line contains an integer T, the total number of test cases. Then T test cases follow. First line of each test case contains N, the number of users with passwords. Second line contains N space separated strings, pass[1] pass[2] ... pass[N], representing the passwords of each user. Third line contains a string, loginAttempt, for which Yu has to tell whether it will be accepted or not

I tried to solve it using recursion along with memoization Here is my solution in python3

#!/bin/python3

import sys

def passwordCracker(passwords, attempt, memo):

    if attempt in memo:
        return memo[attempt]

    for pwd in passwords:
        if attempt.startswith(pwd):
            if len(pwd) == len(attempt):
                memo[attempt] = [pwd]

                return [pwd]

            prefix_size = len(pwd)
            cracked_pwds = passwordCracker(passwords, attempt[prefix_size:], memo)

            if len(cracked_pwds) != 0:              
                cracked_pwds.append(pwd)               
                memo[attempt] = list(cracked_pwds)

                return cracked_pwds

    memo[attempt] = []

    return []

def print_passwords(passwords, attempt, memo):

    #passwords.sort(key = len)
    result = passwordCracker(passwords, attempt, memo)
    result.reverse()
    #print (memo)
    if len(result) == 0:
        print ("WRONG PASSWORD")
    else:
        res_str = result[0] 
        for w in result[1:]:
            res_str = res_str + " " + w
        print(res_str)

if __name__ == "__main__":
    t = int(input().strip())
    for a0 in range(t):
        memo = {}
        n = int(input().strip())
        passwords = input().strip().split(' ')
        attempt = input().strip()
        print_passwords(passwords, attempt, memo)

But it is failing a lot of test cases with runtime error or Time limit exceeded. One particular test case for which it fails is with input

1
10
a aa aaa aaaa aaaaa aaaaaa aaaaaaa aaaaaaaa aaaaaaaaa aaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab

I am not able to figure out how to improve this. Can anyone please suggest a better alternative solution.

Maybe I'm wrong, but it looks like an [evil regex issue](https://stackoverflow.com/questions/12841970/how-can-i-recognize-an-evil-regex). You are matching the `aaa...aaab` string against the regex `(a{1,10})+`. It won't work... — jferard, Apr 14 '18 at 22:41

r3mainer · Answer 1 · 2018-04-15T02:24:38.990

Your memo[] cache is terribly inefficient.

All you have to do in this exercise is insert spaces into attempt so as to separate it into valid passwords, or return WRONG PASSWORD if this is not possible.

Therefore, the only information you need to keep track of is where these spaces should be added.

You can do this without a dict, and without creating any partial lists of matched strings.

Just start with a simple list of numbers (e.g., memo = [-1] * len(attempt)). Whenever you find a match, store the index of the next unmatched character. If this index happens to be equal to len(attempt), then you can obviously stop searching. If no match exists for a particular location, store that result too.

When you get a successful match, step through the list of string offsets to find out where the spaces need to be inserted.

EDIT:

I had a crack at solving this problem at HackerRank. The above method basically worked fine, but needed a few minor tweaks to pass all the tests. In particular, you'll probably need to call sys.setrecursionlimit to allow for a greater level of recursion.

score 2 · Answer 2 · answered Apr 15 '18 at 19:17

I want to add a few hints to @squeamish-ossifrage's answer. You are exploring a colossal number of possibilities. Let's say the attempt has a size n and you have k passwords. You can imagine your exploration as a big tree where every node has k children, namely the next possible passwords in the sequence. If the minimum length of a password is l, your algorithm has an approximate time complexity of O(k^n/l), since you will build every string of n/l passwords and that's why your solution ends with a Time limit exceeded.

As @squeamish-ossifrage wrote, you can improve your memoization technique. But there are other points to consider.

First, in your example, attempt contains a b that is absent of every password. More formally, if the set of characters of attempt is a strict superset of the set of characters of the passwords, you are sure you won't find a solution. If L is the maximum length of the password, the time complexity is O(n+kL).

Second, try to prune the research tree (ie reduce the number of children of every node). How to do this? Simply try to limit the number of acceptable passwords.

The basic idea is that if the attempt does not contain a password, then this password should be excluded from the exploration. You can repeat this operation a each step, since attempt[prefix_size:] may not contain a password that was contained in attempt. Time complexity: O(kn/l) in the best cases with Python search algorithm, at each step.

A more sophisticated idea is to remove password that are composed of other passwords. Since you have a, you don't need aa, aaa, etc. How? You just have to apply the passwordCracker with p as the attempt and the rest of the passwords as passwords (I don't go into detail here). If you find a sequence of other passwords that is equal to p, you don't need p and can remove it from the passwords list. It should be done once at the beginning.

The challenge is solvable using only these techniques and the increase of recursion limit, keeping you memo[] cache.

Excellent advice. For the record, the "minor tweaks" I referred to included a quick sanity check `if not set(attempt).issubset(''.join(pwd)): return false` and sorting the passwords in descending order of length `pwd.sort(key=lambda x: -len(x))`, both performed before calling `passwordCracker()`. Meanwhile, in the `passwordCracker()` function, remember to invalidate positions where no valid break could be found (i.e., if the `for pwd in passwords` loop completes without returning, then mark this position as "impossible" and don't check it again. — r3mainer, Apr 15 '18 at 21:13

גלעד ברקן · Answer 3 · 2018-04-16T01:18:48.217

2

Here's some passing code. We don't necessarily need to check supersets or use clever preprocessing heuristics. One trick is that, although there could be an inordinate number of possibilities, it's enough to return one of them. Meaning, if we have a solution up to index i, we are free to ignore all other possibilities for forming the same length prefix of loginAttempt. This Python code uses two lists, one for starting indexes we already visited and one for saving the chosen sequence of passwords that form the prefix up to the next starting index we'll check.

def passwordCracker(passwords, loginAttempt):
    memo = [[]] + [None for i in xrange(len(loginAttempt))]
    stack = [0]
    visited = set()

    while stack:
      i = stack.pop() # next index

      if i in visited:
        continue

      visited.add(i)

      for idx, p in enumerate(passwords):
        if i + len(p) > len(loginAttempt):
          continue

        if loginAttempt[i:i + len(p)] == p:
          memo[i + len(p)] = memo[i][:] + [idx]
          stack.append(i + len(p))

    if memo[len(memo) - 1]:
      return " ".join(map(lambda i: passwords[i], memo[len(memo) - 1]))

    else:
      return "WRONG PASSWORD"

edited Apr 16 '18 at 01:18

answered Apr 16 '18 at 00:28

גלעד ברקן

23,602
3
25
61

The OP didn't ask for the code. This is a spoiler! Beyond that, I agree that an iterative DFS is better than a recursive one, and that your equivalent of the original memoization is great, but I feel that every time you are doing an exhaustive research, pruning the tree is the first thing to do. For this problem, it was enough. – jferard Apr 16 '18 at 11:34
@jferard thanks for commenting. The OP asked to "suggest a better alternative solution." In that case I figured it might be OK to show code since it would be instructive for the OP, considering that it includes various programming aspects that might be quick and easier to show than to explain. – גלעד ברקן Apr 16 '18 at 13:19
No worries. It's indeed instructive for the OP. I thought giving some guidelines was better than code, but if the OP asked that question, he was aware he might got some code as answer. In addition to my previous comment: a good memoization is also a way to "prune" the tree, since you merge some subtrees. – jferard Apr 16 '18 at 13:50

Runtime error with recursion and memoization algorithm

3 Answers3