1

I've read the following implementation of the trie in python: https://stackoverflow.com/a/11016430/2225221

and tried to make the remove fnction for it. Basically, I had problems even with the start: If you want to remove a word from a trie, it can has sub-"words", or it can be "subword" of another word.

If you remove with "del dict[key]", you are removing these above mentioned two kinds of words also. Could anyone help me in this, how to remove properly the chosen word (let us presume it's in the trie)

Community
  • 1
  • 1
John Berry
  • 179
  • 1
  • 3
  • 9

8 Answers8

4

Basically, to remove a word from the trie (as it is implemented in the answer you linked to), you'd just have to remove its _end marker, for example like this:

def remove_word(trie, word):
    current_dict = trie
    for letter in word:
        current_dict = current_dict.get(letter, None)
        if current_dict is None:
            # the trie doesn't contain this word.
            break
    else:
        del current_dict[_end]

Note however that this doesn't ensure that the trie has its minimal size. After deleting the word, there may be branches in the trie left that are no longer used by any words. This doesn't affect the correctness of the data structure, it just means that the trie may consume more memory than absolutely necessary. You could improve this by iterating backwards from the leaf node and delete branches until you find one that has more than one child.

EDIT: Here's an idea how you could implement a remove function that also culls any unnecessary branches. There's probably a more efficient way to do it, but this might get you started:

def remove_word2(trie, word):
    current_dict = trie
    path = [current_dict]
    for letter in word:
        current_dict = current_dict.get(letter, None)
        path.append(current_dict)
        if current_dict is None:
            # the trie doesn't contain this word.
            break
    else:
        if not path[-1].get(_end, None):
            # the trie doesn't contain this word (but a prefix of it).
            return
        deleted_branches = []
        for current_dict, letter in zip(reversed(path[:-1]), reversed(word)):
            if len(current_dict[letter]) <= 1:
                deleted_branches.append((current_dict, letter))
            else:
                break
        if len(deleted_branches) > 0:
            del deleted_branches[-1][0][deleted_branches[-1][1]]
        del path[-1][_end]

Essentially, it first finds the "path" to the word that is about to be deleted and then iterates through that backwards to find nodes that can be removed. It then removes the root of the path that can be deleted (which also implicitly deletes the _end node).

omz
  • 53,243
  • 5
  • 129
  • 141
  • Thanks, really good idea! I only have problems with the backward iterations now. Since you can go in any dict, to get the key/value, but you can't (as far I know) get the "parent" dict.. But If you want to re-add the same word, you just "re-add" the _end sign, so thanks! :) – John Berry Mar 29 '13 at 19:25
  • Yep, that's kind of tricky without having direct access to the "parent" dict, see my edit for an idea of how you could do it without altering the overall data structure. – omz Mar 29 '13 at 20:09
  • Edited again to make it more efficient. Basically, it's unnecessary to remove all deleted dictionaries separately, it should be sufficient to just remove the "root" of the found path (all the other ones will be children of that). – omz Mar 29 '13 at 20:23
  • You can simplify `remove_word2` a bit by saving different info during the search for the word (the first for loop). Instead of always adding the current dictionary to your `path` list, you can just save the current dictionary and letter if `len(current_dict) > 1` (replacing any previously saved values). Then after the loop ends, delete the saved letter from the saved dictionary. – Blckknght Mar 29 '13 at 21:02
  • @Blckknght I think that wouldn't work if the deleted word is a prefix of another word in the trie (the second for loop wouldn't have to build a list though, as it's only using the last element afterwards). – omz Mar 29 '13 at 21:19
  • @omz: Yeah, there are some corner cases, but they're easy to resolve. You can see how I implemented it [here](https://gist.github.com/BlckKnght/5275131) (though I wrapped nested dicts in a class). – Blckknght Mar 30 '13 at 03:45
  • @Blckknght That link seems to be broken. – omz Mar 30 '13 at 03:46
  • Nice, thanks! But you have a typo in line 32, should be `del current["end"]`. – omz Mar 30 '13 at 19:14
  • Thank you guys! Blckknght, the idea of looking for the target to delete is really impressing! :) – John Berry Apr 01 '13 at 17:54
4

I think it is better to do it recursively, code as following:

def remove(self, word):
    self.delete(self.tries, word, 0)

def delete(self, dicts, word, i):
    if i == len(word):
        if 'end' in dicts:
            del dicts['end']
            if len(dicts) == 0:
                return True
            else:
                return False
        else:
            return False
    else:
        if word[i] in dicts and self.delete(dicts[word[i]], word, i + 1):
            if len(dicts[word[i]]) == 0:
                del dicts[word[i]]
                return True
            else:
                return False

        else:
            return False
1
def remove_a_word_util(self, word, idx, node):
    if len(word) == idx:
        node.is_end_of_word = False
        return bool(node.children)

    ch = word[idx]
    if ch not in node.children:
        return True

    flag = self.remove_a_word_util(word, idx+1, node.children[ch])
    if flag:
        return True

    node.children.pop(ch)
    return bool(node.children) or node.is_end_of_word
himanshu sangal
  • 41
  • 1
  • 11
0

One method of handling structures like this is through recursion. The great thing about recursion in this case is that it zips to the bottom of the trie, then passes the returned values back up through the branches.

The following function does just that. It goes to the leaf and deletes the _end value, just in case the input word is a prefix of another. It then passes up a boolean (boo) which indicates that the current_dict is still in an outlying branch. Once we hit a point where the current dict has more than one child, we delete the appropriate branch and set boo to False so each remaining recursion will do nothing.

def trie_trim(term, trie=SYNONYMS, prev=0):
    # checks that we haven't hit the end of the word
    if term:
        first, rest = term[0], term[1:]
        current_length = len(trie)
        next_length, boo = trie_trim(rest, trie=trie[first], prev=current_length)

        # this statement avoids trimming excessively if the input is a prefix because
        # if the word is a prefix, the first returned value will be greater than 1
        if boo and next_length > 1:
            boo = False

        # this statement checks for the first occurrence of the current dict having more than one child
        # or it checks that we've hit the bottom without trimming anything
        elif boo and (current_length > 1 or not prev):
            del trie[first]
            boo = False

        return current_length, boo

    # when we do hit the end of the word, delete _end
    else:
        del trie[_end]
        return len(trie) + 1, True
Eric Ed Lohmar
  • 1,832
  • 1
  • 17
  • 26
0

A bit of a long one, but I hope this helps answer your question:

class Trie:
    WORD_END = "$"
    
    def __init__(self):
        self.trie = {}

    def insert(self, word):
        cur = self.trie
        for char in word:
            if char not in cur:
                cur[char] = {}
            cur = cur[char]
        cur[Trie.WORD_END] = word

    def delete(self, word):
        def _delete(word, cur_trie, i=0):
            if i == len(word):
                if Trie.WORD_END not in cur_trie:
                    raise ValueError("'%s' is not registered in the trie..." %word)
                cur_trie.pop(Trie.WORD_END)
                if len(cur_trie) > 0:
                    return False
                return True
            if word[i] not in cur_trie:
                raise ValueError("'%s' is not registered in the trie..." %word)
            cont = _delete(word, cur_trie[word[i]], i+1)
            if cont:
                cur_trie.pop(word[i])
                if Trie.WORD_END in cur_trie:
                    return False
                return True
            return False
        _delete(word, self.trie)

t = Trie()
t.insert("bar")
t.insert("baraka")
t.insert("barakalar")

t.delete("barak") # raises error as 'barak' is not a valid WORD_END although it is a valid path.
t.delete("bareka") # raises error as 'e' does not exist in the path.
t.delete("baraka") # deletes the WORD_END of 'baraka' without deleting any letter as there is 'barakalar' afterwards.
t.delete("barakalar") # deletes until the previous word (until the first Trie.WORD_END; "$" - by going backwards with recursion) in the same path (until 'baraka').
0

In case you need the whole DS:

class TrieNode:
    def __init__(self):
        self.children = {}
        self.wordCounter = 0
        self.prefixCounter = 0

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            
            node.prefixCounter += 1                  
            node = node.children[char] 

        node.wordCounter += 1

    def countWordsEqualTo(self, word: str) -> int:
        node = self.root
        if node.children:
            for char in word:
                node = node.children[char]               
        else:
            return 0
            
        return node.wordCounter
 
    def countWordsStartingWith(self, prefix: str) -> int:
        node = self.root
        if node.children:
            for char in prefix:
                node = node.children[char]               
        else:
            return 0

        return node.prefixCounter

    def erase(self, word: str) -> None:
        node = self.root
        for char in word:
            if node.children:
                node.prefixCounter -= 1
                node = node.children[char]
            else:
                return None

        node.wordCounter -= 1

        if node.wordCounter == 0:
            self.dfsRemove(self.root, word, 0)

    def dfsRemove(self, node: TrieNode, word: str, idx: int) -> None:
        if len(word) == idx:
            node.wordCounter = 0
            return

        char = word[idx]
        if char not in node.children:
            return

        self.dfsRemove(node.children[char], word, idx+1)
        
        node.children.pop(char)
        
            



trie = Trie();
trie.insert("apple");                     #// Inserts "apple".
trie.insert("apple");                     #// Inserts another "apple".
print(trie.countWordsEqualTo("apple"))    #// There are two instances of "apple" so return 2.
print(trie.countWordsStartingWith("app")) #// "app" is a prefix of "apple" so return 2.
trie.erase("apple")                       #// Erases one "apple".
print(trie.countWordsEqualTo("apple"))    #// Now there is only one instance of "apple" so return 1.
print(trie.countWordsStartingWith("app")) #// return 1
trie.erase("apple");                      #// Erases "apple". Now the trie is empty.
print(trie.countWordsEqualTo("apple"))    #// return 0
print(trie.countWordsStartingWith("app")) #// return 0

0

I would argue that this implementation is the most succinct and easiest to understand after a bit of staring.

        def removeWord(word, node=None):
            if not node:
                node = self.root
            if word == "":
                node.isEnd = False
                return

            newnode = node.children[word[0]]
            removeWord(word[1:], newnode)
            if not newnode.isEnd and len(newnode.children) == 0:
                del node.children[word[0]]

Although it's a little tricky to understand with the default parameter node=None at first, this is the most succinct implementation of a Trie removal that handles marking the word node.isEnd = False while also pruning extraneous nodes.

  • The method is first called as Trie.removeWord("ToBeDeletedWord").
  • In subsequent recursion calls, a node tied to the corresponding letter ("T" then "o" then "B" then "e" etc. etc.) is added to the next recursion (e.g "remove 'oBeDeletedWord' with the node at T").
  • Once we hit the end node that has the full string ToBeDeletedWord , the last recursion calls removeWord("", <node d>)
  • In this last recursion call, we mark node.isEnd = False. Later, the node is no longer marked isEnd and it has no children so we can call the delete operator.
  • Once that last recursion call ends, the rest of the recursions (e.g TobeDeletedWor, TobeDeletedWo, TobeDeletedW, etc. etc.) will then observe that it too is not an end node and there are no more children. These nodes will also delete.

You will have to read this a couple of times but this implementation is concise, readable, and correct. The difficulty is that the recursion happens midfunction rather than at the beginning or end.

DFeng
  • 347
  • 3
  • 5
0

TL;DR

class TrieNode:
    children: dict[str, "TrieNode"]

    def __init__(self) -> None:
        self.children = {}
        self.end = False

    def __contains__(self, char: str) -> bool:
        return char in self.children

    def __getitem__(self, __name: str) -> "TrieNode":
        return self.children[__name]

    def __setitem__(self, __name: str, __value: "TrieNode") -> None:
        self.children[__name] = __value

    def __len__(self):
        return len(self.children)

    def __delitem__(self, __name: str):
        del self.children[__name]


class Trie:
    def __init__(self, words: list[str]) -> None:
        self.root = TrieNode()
        for w in words:
            self.insert(w)

    def insert(self, word: str):
        curr = self.root
        for c in word:
            curr = curr.children.setdefault(c, TrieNode())
        curr.end = True

    def remove(self, word: str):
        def _remove(node: TrieNode, index: int):
            if index >= len(word):
                node.end = False
                if not node.children:
                    return True

            elif word[index] in node:
                if _remove(node[word[index]], index + 1):
                    del node[word[index]]

        _remove(self.root, 0)
Lajos
  • 2,549
  • 6
  • 31
  • 38