How to find fuzzy word in a text?

Question

I'm using an OCR (Tesseract) to extract data from a document, this document must contains certain keyword to be valid, OCR isn't perfect so sometime he may read for example "Technlquos" instead of "Techniques".
So I'm wondering if there is a way in java to find "techniques" in a text even if it's read by OCR as "Technlquos" ? and the same thing for composed word : searching "Sciences Techniques" must accept "Sclences Technlquos", something like founding the closest word to the searched word and accepting it if it's close enough (75% matching for example) I found some solutions here but none of them is answering my question
Thank you

*I found some solutions [here](http://stackoverflow.com/questions/327513/fuzzy-string-search-in-java) but none of them is answering my question.* Explain why your problem is different if you want a different solution. — shmosel, May 20 '16 at 19:43
If I have correctly understood the answers, they're for comparing two words and not searching a word or multiple words in a text — hereForLearing, May 20 '16 at 19:47
Sounds like you're concerned about [this](http://stackoverflow.com/questions/327513/fuzzy-string-search-in-java#comment54049910_327595). But there are other solutions there. — shmosel, May 20 '16 at 19:54
Thank you , that what I need bitap algorithm, add that like an answer so I can accept it — hereForLearing, May 20 '16 at 21:15

score -1 · Answer 1 · answered May 23 '16 at 05:49

-1

In other OCR libraries, this can be done by keeping recognized word variants in the resulting text. Most likely, "Techniques" is found and considered suspicious by your OCR. If there is an option to keep suspicious word recognition variants, then you will be able to search for it.

answered May 23 '16 at 05:49

Nadia Solovyeva

207
1
7

You mean other libraries then Tesseract? I heared that it's the best open source OCR engine – hereForLearing May 23 '16 at 09:54
OCR realizations are not limited by open-source ones. – Nadia Solovyeva May 23 '16 at 10:15
In my case it's limitted to free ones :/ – hereForLearing May 23 '16 at 10:41

How to find fuzzy word in a text?

1 Answers1