I have a list of a bunch of phrases. Because this is a fairly long list, I also have a text box which users can type into as a search bar. As of right now, terms that do not exactly contain with the letters in the search bar are filtered out. However, I would like to have it give a list of a few suggestions of what the word might be.
Note: I am not looking for a "Did you mean..." or spell checking algorithm like the ones here or here or here (though this image from the first link seems good); I want an algorithm that will be able to suggest the best match for an incomplete word or phrase; e.g. the word "bat"
should be a better match of the word "battery"
than the word "car"
.
It would also be impractical to use Google's method of returning the few strings that are most common that start with (approximately) the same letters, because, as far as I know, each element in the list would be equally as common as any other.
Also, I would like to do this in Java (8); however, other language answers are acceptable, as long as they do not use built in functions for which Java has no equivalent. In case it is useful, I wrote a modified version of Levenshtein distance (below) which fills the search string with asterisks signifying "any character." This works for single words, e.g. "mud"
is a perfect match of "muddy"
, but isn't good enough when considering people may use "car"
to search for "race car"
.
/**
* <ul>
* <b><i>searchDistance</i></b><br>
* <br>
* <code> public static int searchDistance(String key, String match)</code><br>
* <br>
* Gets the Levenshtein distance between <code>key</code> and <code>match</code>. <br>
* If <code>useAsterisk</code> is true, then the follwing applies: If <code>key</code> is shorter than <code>match</code>, the asterisk <code>'*'</code> is appended to it until the lengths are equal. Asterisks can be used in <code>key</code> to signify 'any character.'
* @param key - The text to search for
* @param match - The text to compare <code>key</code> against
* @param useAsterisk - Whether or not to use asterisks for the purpose described above
* @return the Levenshtein distance between <code>key</code> and <code>match</code>.
* </ul>
*/
public static int searchDistance(String key, String match, boolean useAsterisk) {
while (key.length() < match.length()) {
key = key + "*";
}
int[][] matrix = new int[key.length() + 1][match.length() + 1];
for (int i = 0; i < matrix.length; i++) {
matrix[i][0] = i;
}
for (int i = 0; i < matrix[0].length; i++) {
matrix[0][i] = i;
}
for (int a = 1; a < matrix.length; a++) {
for (int b = 1; b < matrix[0].length; b++) {
matrix[a][b] = Math.min(Math.min(matrix[a - 1][b] + 1, matrix[a][b - 1] + 1), matrix[a - 1][b - 1] + (key.charAt(a - 1) == match.charAt(b - 1) || key.charAt(a - 1) == '*' ? 0 : 1));
}
}
return matrix[matrix.length - 1][matrix[0].length - 1];
}
TL;DR: Is there a good way to give completion suggestions for search terms?
Thanks in advance!