2

I need to create spell checker in c for my assessment. I manage to start the work - I have text file which content all the dictionary, I code how to read the files, how to compare chosen file with the dictionary file and now I need to print out to new text file where the misspelled words been found, what they are, and their correct version. The big problem is, that I have no idea how to do this. My code right now can say that there is a difference between the files. But I don't know how make strcmp check string by string, word by word, if something is wrong.

In dictionary files are all the words, so of course, if my program is reading other file, compare, then writes all the words which aren't in the file to the new output file with errors, these output-error words will be also just random words, which aren't even in the text file, or connected with the text file.

I hope I explained my problem well and there is somebody who would tell me how to fix this problem. I don't even ask for the code, I just need some idea how I would need to code the rest of the program. And sorry, for my English, it's my second language, so I still make grammatical mistakes.

Kami
  • 47
  • 5
  • 1
    You need to come up with a data structure that will allow you to identify a sting easily, won't be a huge burden on memory, and will give you close matches to boot. A possible one for this task is a [suffix tree](https://en.wikipedia.org/wiki/Suffix_tree). You will need to load the dictionary file into it, and make use of it while reading the test file. But your question is off-topic, because as it is, it's still too broad. – StoryTeller - Unslander Monica Feb 05 '17 at 14:35
  • 1
    Can you just post your code . – Kedar Kodgire Feb 05 '17 at 14:35
  • see this http://stackoverflow.com/questions/346757/how-do-spell-checkers-work?rq=1 – Suraj Jain Feb 05 '17 at 14:40

1 Answers1

3

Here are some steps you can follow:

  • read the dictionary into a memory structure, for example an array of strings, which you will sort in lexicographical order (with strcmp).

  • read the file line by line, and for each line iterate these steps:

    • initialize a highlight line with spaces, same length as the line read.

    • skip characters that cannot be part of a word with strcspn(), save the index i.

    • scan the characters that can be part of a word with strspn() save this number n.
    • if n is 0, this is the end of line
    • look up the word at index i with n chars in the dictionary (potentially ignoring the case)
    • if the word cannot be found, set the corresponding characters in the warning line with ^ characters.
    • update the index i += n and iterate.
  • if at least one word was not found in the line, output the line and the warning line.

Study these standard functions:

  • strspn()
  • strcspn()
  • qsort()
  • bsearch()
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Thank your very much, I manage to do the spellcheck with dicitonary file and an example file :) Now I just got problem with removing punctuation :/ – Kami Feb 21 '17 at 10:55