2

I need to add the ability to list suffixes to implement our autocomplete feature. To do that, I implemented a function on the TrieNode object that will return all complete word suffixes that exist below it in the trie. For example, if our Trie contains the words ["fun", "function", "factory"] and we ask for suffixes from the f node, we would expect to receive ["un", "unction", "actory"] back from node.get_suffixes(). Here is how I get started:

class TrieNode:

    def __init__(self):
        ## Initialize this node in the Trie
        self.word_end = False
        self.children = dict()

    def insert(self, char):
        ## Add a child node in this Trie
        if not char in self.children:
            self.children[char] = TrieNode()

    def get_suffixes(self):
        pass

I have tested the get_suffixes function separately and it seemed to work fine.

result = []
def get_suffixes(node, suffix=""):
    if not node.children == dict():
        for key in node.children:
            suffix += key
            if node.children[key].word_end:
                result.append(suffix)
            get_suffixes(node.children[key], suffix)
            suffix = suffix[:-1]
    return result

How is how I tested the function:

# Create a mock trie for the test
node = TrieNode()
node.insert("A")
node.children["A"].word_end = True
node.children["A"].insert("t")
node.children["A"].children["t"].word_end = True
node.children["A"].insert("b")
node.children["A"].children["b"].insert("a")
node.children["A"].children["b"].children["a"].insert("c")
node.children["A"].children["b"].children["a"].children["c"].insert("a")
node.children["A"].children["b"].children["a"].children["c"].children["a"].word_end = True
node.children["A"].insert("d")
node.children["A"].children["d"].insert("d")
node.children["A"].children["d"].children["d"].word_end = True
node.children["A"].children["d"].insert("m")
node.children["A"].children["d"].children["m"].insert("i")
node.children["A"].children["d"].children["m"].children["i"].insert("n")
node.children["A"].children["d"].children["m"].children["i"].children["n"].word_end = True

result = []
def get_suffixes(node, suffix=""):
    if not node.children == dict():
        for key in node.children:
            suffix += key
            if node.children[key].word_end:
                result.append(suffix)
            get_suffixes(node.children[key], suffix)
            suffix = suffix[:-1]
    return result

get_suffixes(node.children["A"]) # Returns ['t', 'baca', 'dd', 'dmin'], as expected

The problem occured when I tried moving the get_suffixes function to the TrieNode class. Here I do not know how I should tackle the global variable result. It is not supposed to be a global variable anymore. I have tried two versions:

Version I: make result a class attribute

class TrieNode:

    def __init__(self):
        ## Initialize this node in the Trie
        self.word_end = False
        self.children = dict()
        self.result = []

    def insert(self, char):
        ## Add a child node in this Trie
        if not char in self.children:
            self.children[char] = TrieNode()

    def get_suffixes(self, suffix=""):
        if not self.children == dict():
            for key in self.children:
                suffix += key
                if self.children[key].word_end:
                    self.result.append(suffix)
                self.children[key].get_suffixes(suffix)
                suffix = suffix[:-1]   
        return self.result 

node.children["A"].get_suffixes() # Returns ['t'], which is wrong

Version II: make result a default function parameter

class TrieNode:

    def __init__(self):
        ## Initialize this node in the Trie
        self.word_end = False
        self.children = dict()

    def insert(self, char):
        ## Add a child node in this Trie
        if not char in self.children:
            self.children[char] = TrieNode()

    def suffixes(self, suffix="", result=[]):
        if not self.children == dict():
            for key in self.children:
                suffix += key
                if self.children[key].word_end:
                    result.append(suffix)
                self.children[key].suffixes(suffix)
                suffix = suffix[:-1]   
        return result

node.children["A"].suffixes() # Returns ['t', 'baca', 'dd', 'dmin']
node.children["A"].suffixes() # Returns ['t', 'baca', 'dd', 'dmin', 't', 'baca', 'dd', 'dmin']

The result of Version II is not surprising because:

def append(number, number_list=[]):
    number_list.append(number)
    print(number_list)
    return number_list

append(5) # expecting: [5], actual: [5]
append(7) # expecting: [7], actual: [5, 7]
append(2) # expecting: [2], actual: [5, 7, 2]

I am learning algorithms and data structure in Python. I was asked to do it using a recursive function. Other approaches such as Implementing a Trie to support autocomplete in Python are not the answers I expect though they themselves might be able to solve the problem. I am extremely curious why self.result is not properly modified in Version I but works properly if it does not reside in a class.

James Chang
  • 608
  • 8
  • 21

1 Answers1

2

result belongs to the class TrieNode.

When you return self.result from the get_suffixes method, you are only including the answers found in the current TrieNode Instance.

You need to include the answers found by its children as well. Thanks to recursion the code just needs a minor change and adding self.result+=self.children[key].get_suffixes(suffix) makes everything work.

class TrieNode:
    def __init__(self):
        ## Initialize this node in the Trie
        self.word_end = False
        self.children = dict()
        self.result = []

    def insert(self, char):
        ## Add a child node in this Trie
        if not char in self.children:
            self.children[char] = TrieNode()

    def get_suffixes(self, suffix=""):
        if not self.children == dict():
            for key in self.children:
                suffix += key
                if self.children[key].word_end:
                    self.result.append(suffix)
                else:
                    self.result+=self.children[key].get_suffixes(suffix)
                suffix = suffix[:-1]   
        return self.result 



# Create a mock trie for the test
node = TrieNode()
node.insert("A")
node.children["A"].word_end = True
node.children["A"].insert("t")
node.children["A"].children["t"].word_end = True
node.children["A"].insert("b")
node.children["A"].children["b"].insert("a")
node.children["A"].children["b"].children["a"].insert("c")
node.children["A"].children["b"].children["a"].children["c"].insert("a")
node.children["A"].children["b"].children["a"].children["c"].children["a"].word_end = True
node.children["A"].insert("d")
node.children["A"].children["d"].insert("d")
node.children["A"].children["d"].children["d"].word_end = True
node.children["A"].children["d"].insert("m")
node.children["A"].children["d"].children["m"].insert("i")
node.children["A"].children["d"].children["m"].children["i"].insert("n")
node.children["A"].children["d"].children["m"].children["i"].children["n"].word_end = True


print(node.children["A"].get_suffixes())

Output:-

['t', 'baca', 'dd', 'dmin']

The thing to remember is that every child is a new instance of the TrieNode class and thus has its own separate result array.

Modified Insertion + No Result Array:-

class TrieNode:
    def __init__(self):
        ## Initialize this node in the Trie
        self.word_end = False
        self.children = dict()

    def insert(self, string):
        if len(string) == 0:
            self.word_end = True
            return
        ## Add a child node in this Trie
        if not string[0] in self.children:
            self.children[string[0]] = TrieNode()
        self.children[string[0]].insert(string[1:])

    def get_suffixes(self, suffix=""):
        query_result=[]
        if self.word_end:
            query_result.append(suffix)
        for i in self.children:
            query_result+=self.children[i].get_suffixes(suffix+i)
        return query_result




# Create a mock trie for the test
node = TrieNode()
node.insert("Add")
node.insert("At")
node.insert("Abaca")
node.insert("Admin")

print(node.children["A"].get_suffixes())
print(node.children["A"].get_suffixes())
print(node.children["A"].children["t"].get_suffixes())

Output:-

['dd', 'dmin', 't', 'baca']
['dd', 'dmin', 't', 'baca']
['']
[Finished in 0.0s]
asds_asds
  • 990
  • 1
  • 10
  • 19
  • @JamesChang I was wondering if you want the insert function to be that way? A recursive insert with a String instead of characters would be easier. – asds_asds Apr 25 '20 at 17:31
  • 1
    I've updated my answer. I see that this is a node of a trie, so what you did is correct as well. – asds_asds Apr 25 '20 at 17:47
  • There's a problem if you use += with result. It seems when you execute get_suffixes() multiple times the list returned keeps accumulating. This is similar to Version II in the question I showed you. – James Chang Apr 25 '20 at 18:33
  • 1
    A fix would be to either clear the result array after each query or use a temporary array to calculate the result to each query. `temp=self.result; self.result=[]; return temp;` I've updated the answer to reflect the changes. – asds_asds Apr 25 '20 at 18:48
  • Yeah, you're right, though this does not look very elegant. – James Chang Apr 25 '20 at 18:54
  • I agree, answer to a query should not modify the state of a Class Instance. The `result` array is not a characteristic of the TrieNode class, word_end and children make sense because without them we can't define a Trie Node. Maybe removing the `result` array altogether would be better? – asds_asds Apr 25 '20 at 18:57
  • yeah, but if I remove `result` in the class attribute there's no way the global variable `result` can reside. – James Chang Apr 25 '20 at 18:58
  • Also, I made a mistake in the definition of `get_suffixes` function. It failed to deal with some edge cases. I have updated my question. – James Chang Apr 25 '20 at 19:00
  • 1
    What if the `result` array is dropped altogether and each query result is stored in a variable and used as desired? Check the updated answer once. Tbh, I don't understand the requirement. Do you need a `result` array? If yes, do you need it to be global or as an attribute to the Class? You've already done the latter and included the code in the question. I'm just a little lost here. – asds_asds Apr 25 '20 at 19:06
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/212520/discussion-between-james-chang-and-asds-asds). – James Chang Apr 26 '20 at 03:31