0

I am doing an search script that must correct the incorrect words . For example if I have book and the user types in brok I want my program to show the corrected form. I have all the words stored in a database (just 2 fields, 1 id , 1 word). I did it with the REGEXPand then too levensteihn all the words and find the closest but that took too long for a 5 million rows database. My question is how can I do it faster. I read about FULLTEXT search in mysql but from what i have understood it must search with correct words. The seconf thing that i want from this script is that all the words matter. For example if I have this input: 'brok winter' I want to get me the text 'book winter' even if i have another value in my database that is 'brok'.I mean i do not want to take each word from the input and find the closest and then query the results which contains the most words. I want to search for all the inputed word at once so that they can depend one of each other.

Dqans
  • 1
  • 2
  • Please provide a sample of the code that you tried, it may be optimise. Also, stack overflow usually isn't too keep on such open questions. – Mathieu VIALES Oct 07 '17 at 09:47
  • Plus, you used the MVC5/asp.net-mvc-5 tag along with the PHP tag. Is this question related to BOTH php AND MVC5. MVC5 does not refer to the architecture but to the Microsoft-made .NET Framework ... – Mathieu VIALES Oct 07 '17 at 09:48
  • [this post](https://stackoverflow.com/a/16825347/6838730) may help you, too – Mathieu VIALES Oct 07 '17 at 09:48
  • Spell correction is not an easy task. Most methods rely on precalculation, e.g. store "brok" together with all possible replacements like "book", "broke", "grok", ... ; or e.g. delete one (or x) letter and store these, like "book", "ook", "bok", "boo", and when you input "brok", you do the same and look for "brok", "rok", "bok", "brk", "bro" - see "bok" (It finds a levensteihn-distance of up to 1 delete + 1 edit). More clever algorithm will take into account the contexts (like google autocomplete) or grammer. Find and try to implement a method/library, ask again (with code) if you get stuck. – Solarflare Oct 07 '17 at 09:49
  • As mentioned by Solarflare, auto-correct is a hard thing to implement from an algorithmic point of view. If all you want is to have a working auto-correct, go for a pre-made library. – Mathieu VIALES Oct 07 '17 at 09:51
  • @Solarflare my code is not doing much right now,It just take the potential words froma txt file using a regexp with php and then i levensteihn all the found words. The txt file is as same as the database but i saw that regexp are faster in txt files. Let me understand. So . If u want to select all potential words for user inputed word you must select all the words you have and then find the closest to it yes? How can I do this without a REGEXP. – Dqans Oct 07 '17 at 10:02
  • No, that's not what I was saying. I merely said that most algorithms will require some precalculation (e.g. store correct and incorrect words), and that you *could* take context/grammar into account (which is a lot more complicated). I doubt you can do any of that with regexp, though. You might be able to find some library that implements it, but stackoverflow is neither here to find a library nor to write the code for you. You will have to find a method/algorithm you like, try that on your own, and if you get stuck, ask for help for a specific problem with your code (and include your code). – Solarflare Oct 07 '17 at 10:19
  • You may get some inspiration at [How does Google Instant work?](https://stackoverflow.com/q/3670831/6248528) (as that is basically what you are trying to do), or anything you can find in a search engine of your choice when you ask it how google instant works - or, actually, any misspelled version of that question too. – Solarflare Oct 07 '17 at 10:25
  • @Solarflare I just want some advice, because I am trying to find a solution to this for over 2 weeks. I do not want any code, i just want an idea. For example how would you do it. If REGEXP is not the way to go how can I select the closest words to it. For example if i have the same database as in my question and let's say that I have a title called 'the book of the year' and the user types in 'brok' how can i select this title without regexp? And i think i must to correct word first no and after make the query to extract the title no? – Dqans Oct 07 '17 at 12:40
  • As I said: questions like "how would you do it?" do not fit on stackoverflow, as "I" would do it differently than user29. There is no correct answer, while asking/listing all options is too broad. It also depends on your requirements (e.g.: context search, database size) or restrictions (e.g. skill, budget, storage/ram). You can bing/yandex/google/baidu for algorithms that do it. I described 2 (of many) in my comment. E.g.: store "the he te th book ook bok boo of o f year ear yar yer yea". If you input "brok", search for "brok", rok", "bok", "brk" and "bro" (e.g. with a fulltext search). – Solarflare Oct 07 '17 at 14:54

0 Answers0