0

A BK Trees (Burkhard-Keller Trees) is associated with fuzzy string searches (e.g. spell check, word recommendations). And all the BK Trees searching algorithm are the same as explained here. The aim is to return, for example, "seek" and "peek" if I search for "aeek".

Now, my question is, I'm trying to use this fuzzy string searches algorithm to search for all similar items from the given dictionary. For example, given a word "seek", I want to find all similar words, like "peek", "geek", "seat", etc within the dictionary. However I found the BK Trees searching algorithm that everyone uses is not designed for that.

Take a look my sample test result here. I found that the dictionary will be different if the feeding words order is different, thus the search result can be different as well.

What I want is, using my above sample test, given any of the four Python books, a SearchAll function will always return the four Python books, despite the order the dictionary is built, or the order the search is done.

However, I've tried many ways but all failed (e.g., this is one of them). Now I'm throwing up my hands and asking for help. A pseudo code or generic algorithm describing would do. Thx.

xpt
  • 20,363
  • 37
  • 127
  • 216

1 Answers1

1

You have an integer overflow on lines 77 and 106 of bktree.go:

k := d - r

Since the types of d and r are uint8, the type of k is also uint8, so when d < r, k ends up larger than d + r, and the following cycle doesn't execute.

You can fix it like this:

k := int16(d) - int16(r)
max_k  := int16(d) + int16(r)
if k < 1 {
    k = 1
}
for ; k <= max_k; k++ {
    ...
}
Anton
  • 3,113
  • 14
  • 12
  • Actually, that won't happen, as in lines https://github.com/go-dedup/simhash/blob/master/sho/bktree.go#L73-L74, I.e., when on lines 77 and 106, `d` would be `>= r`. But thanks for taking time looking into it! – xpt Aug 27 '17 at 18:40
  • I went through it with the debugger, and it does happen on the line [106](https://github.com/go-dedup/simhash/blob/master/sho/bktree.go#L106). Moreover, after the fix, `ExampleSearch_filesA`, `ExampleSearch_filesB`, `ExampleSearch_filesS` all output the same result. – Anton Aug 27 '17 at 19:02
  • Oh, there! Thanks A LOT! what debugger are you using BTW? – xpt Aug 27 '17 at 20:24
  • @xpt I was using JetBrains Gogland. – Anton Aug 27 '17 at 20:32
  • thanks @user3290797, sorry that I can only up-vote, because I'm still hoping there is a `SearchAll` function that can always return the four Python books for each of them. – xpt Aug 27 '17 at 20:34