Questions tagged [bk-tree]

A BK-tree is a metric tree specifically adapted to discrete metric spaces.

A BK-Tree is a data structure for building a "dictionary" of similar words. It can be used to guess that you meant "cat" when you wrote "cta". It works by building a tree with words from a dictionary by using the first word as a root node and then attaching subsequent words with a branch of length d(root_word, new_word) where d is a function for finding the "distance" between two words. This is usually the Levenshtein_ distance, i.e. the minimum number of edits needed to transform one string into the other.

https://en.wikipedia.org/wiki/BK-tree

9 questions
8
votes
1 answer

How do I balance a BK-Tree and is it necessary?

I am looking into using an Edit Distance algorithm to implement a fuzzy search in a name database. I've found a data structure that will supposedly help speed this up through a divide and conquer approach - Burkhard-Keller Trees. The problem is…
6
votes
1 answer

Deleting a node in a BK Tree

I have seen many different implementations of BK Trees in many different languages, and literally none of them seem to include a way to remove nodes from the tree. Even the original article where BK Trees were first introduced does not provide a…
farsil
  • 955
  • 6
  • 19
5
votes
3 answers

How to implement a fast fuzzy-search engine using BK-trees when the corpus has 10 billion unique DNA sequences?

I am trying to use the BK-tree data structure in python to store a corpus with ~10 billion entries (1e10) in order to implement a fast fuzzy search engine. Once I add over ~10 million (1e7) values to a single BK-tree, I start to see a significant…
0x90
  • 39,472
  • 36
  • 165
  • 245
3
votes
3 answers

Understanding BK Trees: How do we derive the (d-n, d+n) range from the triangle inequality?

Reading this post about BK Trees, I found the following snippet a bit confusing: "Assume for a moment we have two parameters, query, the string we are using in our search, and n the maximum distance a string can be from query and still be returned.…
Duck
  • 178
  • 1
  • 11
2
votes
1 answer

Finding Friend of Friend in a Dictionary using Levenshtein distance

Following is what I am trying to do. Two words W1 and W2 are friends if the Levenshtein distance for those words are 1. I am supposed to find all the friend of friend also. I tried to do the same thing with Bk-Tree. It works for small size…
Avinash
  • 12,851
  • 32
  • 116
  • 186
2
votes
1 answer

How optimize BK-Tree

I'm implementing a BK-Tree in Cython. For one million items, the search time is too long! It's ~30 seconds :( Here is my Cython code: # -*- coding: UTF-8 -*- from itertools import imap from PIL import Image DEF MAX_TREE_POOL = 10000 cdef extern…
p0is0n
  • 31
  • 3
0
votes
1 answer

BK - Tree Search All

A BK Trees (Burkhard-Keller Trees) is associated with fuzzy string searches (e.g. spell check, word recommendations). And all the BK Trees searching algorithm are the same as explained here. The aim is to return, for example, "seek" and "peek" if I…
xpt
  • 20,363
  • 37
  • 127
  • 216
0
votes
1 answer

Has this algorithm been implemented properly?

I am currently implementing a BK-Tree to make a spell checker. The dictionary I am working with is very large (millions of words), which is why I cannot afford any inefficiencies at all. However, I know that the lookup function that I wrote…
efficiencyIsBliss
  • 3,043
  • 7
  • 38
  • 44
-1
votes
1 answer

BK-Tree Implementation Insertion time is more how to reduce

Following is my attempt to write BK-Tree , for 150000 word file it takes around 8 seconds Is there any way to reduce this time. Following is my code #include #include #include #include #include
Avinash
  • 12,851
  • 32
  • 116
  • 186