I'm using MYSQL's FULL TEXT search functionality (in Mysql 5.6.33).
If I do a MATCH in NATURAL LANGUAGE mode, for a postcode, with a one-character typo, i get some decent results back, including the results with the "right" postcode, but they're not near the top.
For example, there are 10 schools with postcode "BN2 1TL"
. I deliberately misspell this as "BN2 1TM"
and do a search as follows:
SELECT record_id, address_string,
MATCH (address_string) AGAINST ("BN2 1TM" IN NATURAL LANGUAGE MODE) AS score
FROM schools
WHERE MATCH (address_string) AGAINST ("BN2 1TM" IN NATURAL LANGUAGE MODE) > 0
ORDER BY score DESC;
On closer inspection, it's because the search has bought back all results that have either "BN2"
or "1TM"
in their address_string
column, and they all have exactly the same score, and so are in random order, effectively. .
This is perfectly reasonable behaviour, but it would be great if I could get the score to take "closeness" into account, meaning that, for a search on "BN2 1TM"
, "BN2 1TL"
would be scored more highly than "BN2 3PQ"
. Is there a way to do this?
EDIT: I remembered that this type of closeness is technically called "Levenshtein distance", which is a reference to the Levenshtein algorithm for determining how many replacements are needed to turn one string into another. So I guess my question could be like "Can i get MYSQL FULLTEXT NATURAL LANGUAGE MODE scoring to take Levenshtein distance into account"?