1

I'm new to machine learning, I need to write an application which check whether a name is correct or misspelled.

Can you give me some advice where I should begin? Which is the best algorithm to use in this case?

Giuseppe Pes
  • 7,772
  • 3
  • 52
  • 90
  • this may guide you in the right direction http://stackoverflow.com/questions/2294915/what-algorithm-gives-suggestions-in-a-spell-checker – Zia Sep 11 '12 at 11:14
  • One more sources : [AT&T Archive: THE UNIX Operating system](http://youtu.be/tc4ROCJYbm0) In the video, the guy showcased how to do a simple spell check program simply by gluing different small unix programs by pipes. If the problem is not huge, that is a simple way. Check that out ! – Hotloo Xiranood Sep 11 '12 at 19:37

3 Answers3

2

If checking spelling is all you need to do you can create a hash set of all the words from some freely available dictionary and then check if typed word is in the dictionary. Are there any other requirements to your task?

Ivan Koblik
  • 4,285
  • 1
  • 30
  • 33
  • I have to implement : Spelling Error Detection and Spelling Error Correction I may face these kind of problems : Non-name errors Typographical Homophones. The application should implement a machine learning algorithm – Giuseppe Pes Sep 11 '12 at 13:51
  • 1
    Then I would recommend to read the post that Zia linked to. Most relevant part to you would be this article http://stackoverflow.com/a/2294926/51260. Additionally please take a look at [this chapter](http://nlp.stanford.edu/IR-book/html/htmledition/dictionaries-and-tolerant-retrieval-1.html) from [Introduction to Information Retrieval](http://www-nlp.stanford.edu/IR-book/). – Ivan Koblik Sep 11 '12 at 13:59
  • 1
    Also you can watch Coursera lectures on [Natural Language Processing](https://class.coursera.org/nlp/lecture/preview/index) given by Dan Jurafsky and Chris Manning. Search for Week2 - Spelling Correction. – Ivan Koblik Sep 11 '12 at 14:07
  • Do I always need to use a dictionary? Because I have to correct proper names and surnames. I should define a P(C) with the most defuse name? – Giuseppe Pes Sep 11 '12 at 18:04
  • Sorry I'm not familiar with the term P(C). I'd say with some algorithms it's possible to accept unknown words, proper names for example. But I wouldn't think that it's possible to suggest a correct spelling if you've never seen it. – Ivan Koblik Sep 11 '12 at 18:57
  • 1
    you're helping me a lot! Thanks. However, I think I will use a data set with all English names and surnames and Damerau-Levenstein function to check the distance between two words. It may work correctly ... – Giuseppe Pes Sep 11 '12 at 19:26
1

Peter Norvig and Stuart Rusell's book "Artificial Intelligence - A Modern Approach" would be a good place to start.

rtrt
  • 11
  • 1
1

I suggest start with following article from norvig spell correct. It explains basic ideas behind spelling corrector with python code provided.

What I wanted to do here is to develop, in less than a page of code, a toy spelling corrector that achieves 80 or 90% accuracy at a processing speed of at least 10 words per second.

According to this article: "The full details of an industrial-strength spell corrector are quite complex.". You may start from its references. I think whatever you implement must have better accuracy/performance than this implementation.

Atilla Ozgur
  • 14,339
  • 3
  • 49
  • 69