2

I have a table dictionary which contains a list of words Like:

   ID|word
   ---------
    1|hello
    2|google
    3|similar
    ...

so i want if somebody writes a text like

"helo iam looking for simlar engines for gogle".

Now I want to check every word if it exists in the database, if not it should get me the similar word for the word. For example: helo = hello, simlar = similar, gogle = google. Well, i want to fix the spelling errors. In my database i have a full dictionary of all english words. I coudn't find any mysql function which helps me. LIKE isn't helpfull in my situation.

Name
  • 139
  • 2
  • 9

3 Answers3

7

you can use soundex() function for comparing phonetically

your query should be something like:

select * from table where soundex(word) like soundex('helo');

and this will return you the hello row

Nir Levy
  • 12,750
  • 3
  • 21
  • 38
  • 2
    Nice one. Never heard of that before. It's not going to solve all the questioner's problems, but I'm voting this up anyway :) – Pete Feb 04 '16 at 21:48
  • it does not work on driver - soundex(client_first_name) like soundex('drier') – user2573099 Jan 22 '23 at 05:32
3

There is a function that does roughly want you want, but it's intensive and will slow queries down. You might be able to use in your circumstances, I have used it before. It's called Levenshtein. You can get it here How to add levenshtein function in mysql?

Community
  • 1
  • 1
Pete
  • 1,289
  • 10
  • 18
  • Oh, great. Do bear in mind the slowness you might get on bigger queries. And please accept my answer if you like it. Thanks. – Pete Feb 04 '16 at 21:49
3

What you want to do is called a fuzzy search. You could use the SOUNDEX function in MySQL, documented here:

http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_soundex

You query would look like:

SELECT * FROM dictionary where SOUNDEX(word) = SOUNDEX(:yourSearchTerm)

... where your search term is bound to the :yourSearchTerm parameter value.

A next step would be to try implementing and making use of a Levenshtein function in MySQL. One is described here:

http://www.artfulsoftware.com/infotree/qrytip.php?id=552

The Levenshtein distance between two strings is the minimum number of operations needed to transform one string into the other, where an operation may be insertion, deletion or substitution of one character.

You might also consider looking into databases that are aimed at full text searching, such as Elastic Search, which provides this natively:

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-fuzzy-query.html

mmccaff
  • 1,281
  • 1
  • 11
  • 20
  • 1
    I don't like to downvote, but you'll get down votes for that line of "code" because of not sanitising the incoming text. – Pete Feb 04 '16 at 21:51
  • 1
    I thought the same - I was quickly writing pseudo-code to illustrate the general idea. I edited it to suggest using parameter binding. – mmccaff Feb 04 '16 at 21:52
  • I like this answer. Is sanitizing the code a requirement on SO? I believe it's on the implementer of the proposed features to handle sanitization and other data processing procedures. From this answer, I was able to gather two different approaches which I'll do my own research on and implement them how I see fit. Thank you! – dchayka Jan 08 '20 at 21:42